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This  report  is  the  summary  of  my  studies  in  the  area 
of  sparse  matrices  and  the  results  of  the  progr^uns  which  I 
wrote.  Although  I confined  my  analyses  to  Gaussian  solution 
schemes,  I wrote  the  text  so  that  a follow-on  student  can 
easily  apply  some  of  my  recommendations  and  procedures  to 
other  s(>arse  matrix  techniques.  1 also  detailed  the  think- 
ing which  I us '3d  to  build  my  algorithm;  such  an  algorithm  is 
not  widely  used  for  sparse  matrix  solutions  because  of  limi- 
tations of  many  popular  computers.  But  because  of  some 
novel  techniques  which  I used  auid  the  strong  arithmetic  capa- 
bilities of  the  AFIT  CDC  6600  Computer,  I feel  that  my  algo- 
rithm may  be  of  great  use  to  engineers  and  physicists. 

I wish  to  acknowledge  the  guidance  of  my  laboratory 
sponsor,  Capt.  Carl  E.  Oliver  of  the  Air  Force  Weapons  Lab, 
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clearly  define  the  thesis  objectives;  one  of  Capt.  Oliver's 
co-workers,  Mr.  Mark  Gatti,  provided  excellent  and  timely 
support  in  part  of  the  test  phase  of  this  project.  I fur- 
ther wish  to  thank  my  thesis  advisor.  Prof.  Bernard  Kaplan, 
whose  vast  ^ixperience  in  Numerical  Analysis  was  a most  valu- 
able source  in  the  formulation  of  my  algorithm.  Finally,  I 
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CowDuter  Program  Information 


The  following  is  a suinin.iLry  of  the  programs  developed 
by  this  student  as  part  of  this  thesis.  The  algorithms  of 
these  programs  are  directly  suited  to  the  CDC  6600  Computer 
at  the  Air  Force  Institute  of  Technology. 

All  of  the  programs  are  written  in  CDC  FORTRAN  Extended, 
Version  Listings  of  these  programs  may  be  obtained  from 
the  AFIT  Computer  Archive,  AFIT/AD,  Wright-Patterson  AFB, 

CW,  45433. 


1.  MFP  - A Gaussian  Elimination  sparse  matrix 
solver  with  various  strategic  pivoting  schemes. 

2.  MFPTH  > A Gaussian  Elimination  sparse  matrix 
solver  with  a consecutively  calculated  pivoting 
strategy. 

3.  MFPOP  - The  same  program  as  MFPTH  except  that 
the  user  can  choose  either  the  computed  pivoting 
strategy  or  an  a priori  strategy  depending  on 
the  circumstances  of  his  particular  problem. 

4.  CEBIT  - This  program  represents  the  same 
capabilities  as  MFPOP  except  that  a new 
packing  scheme  is  used. 

5.  SMART  - The  same  program  as  CEBIT  except  the 
modular  subroutines  used  in  the  pregx  jd  devel- 
opment are  replaced  by  prograun  statements 
within  the  sparse  solver  itself.  This  program 
is  the  production  model  of  the  Gaussian  Elimi- 
nation algorithm  developed  in  this  thesis. 

A user’s  guide  to  the  prcgraia  SMART  can  be  obtained  from  the 

AFIT  Physics  Department. 
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Abstract 

A comparison  is  made  of  the  merits  of  three  popular 
algorithms  for  direct  solutions  of  large,  sparse  matrices! 
Gaussian  Elimination,  LU  Decomposition,  and  Gauss-Jordaui  Re- 
duction. The  last  tv.'o  algorithms  are  used  in  exi'^ting 
sparse  matrix  solvers  at  the  Air  Force  Weap)ons  Lab,  Kirt'  >nd 
AFB,  NM.  \ mathematical  theory  discussion  explains  the  al- 
gorithms and  predicts  their  performance  for  arbitrary  and 
strongly  structured  matrices.  The  performance  comparison 
involves  a wide  range  of  problems  practical  to  technical 
study  at  the  Weapons  Lab.  Particular  emphasis  is  placed  on 
solution  accuracy  and  the  efficient  use  of  core  space.  The 
same  test  problems  are  used  to  analyze  the  Gaussian  Elimina- 
tion algorithm  programmed  by  this  student.  From  a istudy  of 
the  performance  of  several  Gaussian  solution  strategies,  a 
new  strategy  is  developed  which  offers  the  user  a range  of 
options  for  his  particular  programming  needs.  The  salient 
points  of  this  strategy  include  some  stability  features  of 
partial  pivoting  and  some  array  optimization  similar  to  roin- 
inum  row/minimuro  column  pivoting.  The  final  Gaussian  Elimi- 
nation program  is  enhanced  by  a new  packing  scheme  which  is 
highly  suited  for  the  CDC  6)600  computers  many  ax  .ays  can  be 
coiapacted  into  a single  array  by  subdividing  the  long  com- 
puter word  structure.  A final  qualitative  comparison  is 
presented  from  which  an  optimal  solution  method  is  proposed 
and  further  study  recommended. 
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DIRECT  SOLUTIONS  OP  LARGS,  SPARSE  LINEAR  SYSTEMS 


I.  intgff4wfi.t4gn 


ClAAsical  LiiMAX  AlgsbXA  d«fiiMs  * ”lin«ar  systca”  as 
OM  whose  aodsl  esn  bs  rspressntsd  by  the  estrix  equetion 

^ k (1) 

where  is  sn  "n-by-n"  systew  wstrix,  2I  * colusm  vector 
of  solutions,  end  is  s coluwn  vector  of  constsnts.  A 
"qtATse"  systew  is  understoed  to  be  one  whose  non-sero  ele* 
■ents  of  the  ^ wstrix  are  fewt  no  wore  than  5H  (and  typic:'^!- 
ly  less  than  1%)  of  the  total  nuwber  of  possible  entries. 
Sparse  watrices  are  ususdly  associated  with  systewa  whose 
sisw  (or  "rank*)  is  very  large  (about  a thousand). 

Large,  sparse  systews  of  equations  cows  as  a result  of 
work  in  wany  fields  1 physics,  engineering,  and  business  wan- 
agswsnt.  In  technical  fields,  a frequent  use  of  sparse 
watrix  techniques  is  in  the  approxiwation  of  the  solutions 
of  differential  equations.  Frequently,  the  variables  of 
differential  equations  way  not  be  neparable,  or  tho  geowetry 
of  sosis  problew  nay  not  be  described  by  siBq>]e,  algebraic 
fuartionsi  under  these  kinds  of  conditions,  classical  tech- 
niques for  solving  differential  equations  cannot  be  used}  an 
approxiwstion  nethod  is  necessary. 

The  folJosring  is  an  illustration  of  how  a systes  (which 
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happens  to  h*v«  a straightforward  analytic  solution)  can  b« 
solved  a finite  difference  technique  which  yields  a 
sparse  natrix  (Ref  lit  149,  233-261). 

Oiveni  A very  long  rod  whose  cross-section  is  a unit 
square,  and  whose  heat  generation  is  unifom  froe  within. 
Problea;  To  solve  for  the  tesperature  distribution  along 
the  x-axis  for  the  indicated  boundary  conditions. 

Solutions  The  governing  partial  differential  equation 
is 


4^ 


♦ 


♦ 1 


O (2) 


where  *u”  is  a normalised  tesperature  parameter.  The 
boundary  conditions  are 


u(l,y)  « O 
^O.y)  . O 


tt(x,X)  • O 
^x.O)  . O 


(3) 
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The  "exact*  analytical  solution  is 


. 1 . . £ I 1)"*^  cort.[(2n*l)(V2)x1 ^ 

(3n.l)^  co.h[(2n.l)ir/3] 

Tb  approximate  the  partial  differential  equation,  a 
set  of  nodal  points  (Fig.  1)  is  defined  in  the  region  of 
interest.  The  distance  bets*eea  adjacent  points  in  the  x 
and  y directions  are  Ax  and  ^ respectively.  The  partissl 
differential  equation  is  approximated  in  ttse  following 


a 


'wr 


' 


(1,4)  .(2,4)  .(3,4)  .(4,4) 


(1,3)  .(2,3)  .(3,3)  .(4,3) 


(1,2)  .(2,2)  .(3,2)  .(4,2) 


Pig.  1.  Nodal  Point  distribution  (Proa  Rsf  lit 255) 


Mbowt  any  ?)Od«  u(a,n). 


^•l.n  ” *“a.n  * “a^l.i 


iy  substituting  Eqs  (5)  and  (6)  back  into  Eq  (2),  and  allow 
ing  Ax  ■ Ay,  ths  following  equation  results i 

♦ *Si,o  -"Wl.n  • 

Wy  invoking  characteristic  ssmaetry  of  these  approxiaations 
and  noting  that  in  this  case  Ax  ■ 1/4,  the  sparse  systea 
depicted  in  Pig.  2 is  the  result. 


Fig«  2.  Spaurse  Matrix  generated  froiu  Eq  (7)  (From  Ref  11:255) 
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It  is  inportsnt.  note  that  to  solve  for  only  four 
unknown  nodes  on  the  x-axi.s  (as  the  problem  stated),  it 
takes  16  equations.  Far  more  rougher  a^roximations  yield 
less  accurate  results;  Fig.  3 demonstrates  the  effects  of 
an  iqrjproximation  with  Ax  » 1.0,  0.5,  and  the  chosen  0.25  of 
this  example.  Clearly,  as  Ax  gets  smaller,  the  nodal  solu- 
tions are  distributed  more  closely  to  the  actual  analytic 
solution.  As  a consequence  of  shrinking  the  sise  of  Ax, 
however,  the  number  of  equations  increases  very  quickly. 

Thus  for  a near -perfect  approximation,  a very  large  number 
of  equations  is  necessary  (hence  the  development  of  large . 
sparse  matrices). 

In  the  above  example,  the  curve  for  Ax  ■ 0.25  is  indeed 
very  close  to  the  "exact"  solution  (Fig.  3);  this  "good"  ap- 
proximation m»y  suggest  that  only  16  equations  are  needed 
for  a reasonably  accurate  solution  (as  opposed  to  the  1000 
equations  suggested  above).  However,  had  the  geometry  been 
more  arbitrary,  the  nodal  elements  of  16  equations  would  have 
not  provided  adequate  detail  at  the  boundary.  Fig.  4 is  pro- 
posed as  such  an  example.  To  attain  a good  approximation,  a 
vast  increase  in  the  nodal  density  is  required.  In  any  case, 
the  use  of  sparse  matrix  approximation  for  the  problem  of 
Fig.  4 is  auch  preferred  to  an  analytic  solution. 

A frequent  consequence  of  spaurse  matrix  construction  is 
that  the  non-sero  elements  are  distributed  in  an  predominant- 
ly diagonal  structure  with  flanking  diagonals.  The  system 
in  Fig.  2 illustrates  a tridiagonaJL  core  structure  with  two 
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fX«nking  diagonals  each  displaced  three  diagonals  away  from 
the  wain  diagonal.  These  "banded"  structures  occur  often  in 
physics  and  engineering  problems. 

Although  linear  algebraic  equations  are  theoretically 
■ore  easy  to  solve  than  differential  equations » because  of 
the  potentially  vast  nuMber  of  equations,  the  systea  Must  be 
solved  in  a digital  cosputeri  the  problem  is  to  handle  the 
characteristically  large  amounts  of  data  as  effectively  as 
possible.  As  an  illustration  of  the  problem,  a system  of 
sise  1000  would  require  a million  locations  in  the  computer 
core  for  data  alonet  since  500,000  locations  is  a typical 
i^sper  bound  for  most  large  coaqmters,  the  entire  A aAtrix 
could  not  be  stored.  Therefore,  packing  routines  must  t»e 
written  whicJi  need  to  store  ovv\y  the  non-sero  values  of 
in  this  case,  a sparse  matrix  of  sise  1000  at  5fl  sparsity 


7 


trould  need  no  more  then  50,000  locations  to  pack  the  non* 

*seros. 

The  actual  solution  Method  aust  also  be  chosen  to  main- 
tain sparseness  as  much  as  possible  as  the  program  runs.  A 
result  of  the  classical  inversion  solution, 

s - <e) 

i.  that  ia  vary  dana.  and  will  damand  cora  in  axceas  of 
that  available. 

Even  with  an  ideal  packing  scheme,  one  cannot  assume 
perfect  algebraic  accuracy  in  any  computer;  the  conception 
and  growth  of  errors  is  a very  important  consideration  in 
the  construction  of  the  sparse  matrix  solver* 

Another  factor  bearing  on  the  problem  Is  the  growth  of 
the  ^ matrix  as  it  is  coasted;  new  non-xeros  (called  "fill- 
in*)  may  be  manifested  and,  in  some  circumstances,  force  the 
data  storage  requirements  beyond  the  limits  allowed. 

Ttve  pursuit  of  the  solution  to  this  problem  is  the 
theme  of  this  thesis. 

The  following  objectives  were  defined  for  this  project 
as  a result  of  the  motivation  of  the  utility  of  finite  dif- 
ference techniques  and  the  guidelines  of  the  problem  state- 
ments 

ftnBirliPii  Mfftinfl  fMkm  Hitrix  89J.Yt««  ^ 
sparse  matrix  algorithms,  already  in  use  at  the  Air  Pores 
Ifsspons  Laboratory,  were  to  be  compared.  The  desired 
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i- 

ii 
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outcoM  WAS  to  find  those  classes  of  problems  which  each  al- 
gorithm solves  the  best.  The  programs  compared  are  The  Yale 
Sparse  Matrix  Package  by  Sherman  (Ref  14)  and  a program 
called  "SIMULT"  by  Key  (Ref  9). 

A New  Packing  Scheme.  A third  sparne  matrix  solver  was 
programmed  as  part  of  this  thesis  with  which  the  existing 
programs  were  ccmpared.  A new  packing  scheme  was  developed 
in  an  effort  to  e3q>loit  the  sparseness  of  the  test  matrices 
and  more  efficiently  use  the  allotted  core  storage. 

The  result  of  the  accomplishment  of  these  objectives 
was  a choice  from  the  three  programs  of  the  "most  desirable" 
sparse  matrix  solver  as  a computational  tool. 

Standards 

There  were  three  significant  measures  of  performance 
readily  available  on  the  coiqputer  printouts  t one  otk.sr  crite- 
rion was  rather  intangible t but  nevertheless,  important. 

The  criteria  used  in  judging  the  sparse  matrix  solution 

methods  were 

1.  Accuracy I 

2.  Core  storage  requirements} 

3.  Execution  time} 

4.  The  degree  that  a routine  met  the  user's  needs. 

The  fourth  criterion  was  ijqportant  in  that  the  outcome  of 
the  thesis  pertains  to  engineering  problems  and  not  to  ma- 
trices which  are  ^pawned  by  mere  academic  curiosity. 

The  performance  of  tlie  sparse  matrix  routine  developed 
as  part  of  this  thesis  was  compared  to  the  existing  sparse 
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fiolv«rtt  the  nolutions  given  by  Sherxuui  and  Key's  programs 


were  used  as  a "perfonaance  frame  of  reference." 


SSgPf 

The  analyses  of  this  thesis  were  limited  to  the  per* 
formances  of  the  three  routines  on  strictly  non>singular, 
square  arrays.  The  best  particular  solution  for  ^ was  the 
goal  of  each  computer  program  as  op>po8ed  to  the  eigenvalue 
problem. 

There  are  two  basic  Mthods  of  sparse  matrix  solution! 
iterative  and  direct}  each  of  the  programs  under  study  was  a 
direct  sparse  matrix  solver.  Furthermore,  the  particular 
direct  methods  analysed  were  the  Gaussian  Elimination,  the 
LD  Decomposition  and  the  Gauss*Jordan  Reduction  algorithms. 
These  algorithms  were  tested  against  various  structures  of 
genersJ.  sparse  matrices}  the  ramifications  of  special  struc- 
tures (such  as  symmetric)  were  not  covered. 


Aasumptions 

The  testing  of  the  algorithms  against  the  indicated 
standards  involved  practical  problems}  therefore  a certain 
broad  class  of  problems  made  up  the  bulk  of  the  tests.  The 
example  of  Fig.  1 yielded  a "well-conditioned"  matrix  (de- 
fined in  Chapter  II)  which  was  also  diag^nsilly  dominant  and 
bended.  While  it  is  invalid  to  assuM  that  all  practical 
sparse  matrices  are  similarly  structured,  it  was  assumed 
that  ve^  badly  conditiemed,  near-singular  matrices  would 
not  generally  laeed  to  be  solved  by  these  programs  in 
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practice . 

It  itf  also  important  to  assunc  that  ths  computsr  into 
which  a futurw  user  may  load  any  of  th«s«  prograas  is  th« 
saas  aaks  as  that  ussd  to  produca  the  perfornance  data. 
Naturally t this  assuoqption  iBq;>lies  the  ability  to  duplicate 
the  results  of  this  thesist  new  packing  scheae  devel- 

oped as  part  of  this  thesis  relies  heavily  on  the  word  struc- 
ture of  the  CDC  6600  Computer  (cossson  to  both  APIT  and  AFWL). 
Any  claims  for  performance  based  on  the  esqperimental  data  of 
this  thesis  snist  be  referred  to  the  hardware  superiority 
which  the  CDC  6600  coeqmter  has  over  other  makes. 

AppgQi^ch 

Because  of  practical  limitations*  the  Yale  %>arse 
Natrix  Package  program  had  to  be  run  at  Weapons  Lab*  while 
SZMULT  and  the  third  algorithm  were  run  at  AFIT.  To  the 
greatest  extent  practicable*  however*  the  test  matrices  were 
standardised  so  that  the  results  of  all  three  prograas  were 
mutually  meaningful. 

The  programming  at  APIT  was  budgeted  $700  to  complete 
the  project.  To  efficiently  handle  the  test  matrices*  the 
test  data  were  stored  on  permanent  disk  files  and  read  into 
each  sparse  solver  from  a local  program  file  instead  of  from 
cards.  (Naturally*  the  finsUL  production  model  reads  all 
data  from  cards.)  Some  matrix  tests  were  simple*  dense 
matrices  with  known  solutions  | these  tests  were  used  to  ver- 
ify the  operation  of  the  packing  schemes  and  the  solution 
logic. 
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The  standard  t«st  Mtric«s  ««r«  all  of  sis*  100|  thia 
also  waa  large  enough  to  repreaeut  a "aparaa”  ayatee  aolu- 
tion,  but  not  too  large  aa  to  prohibit  execution  on  the  core- 
lieited  IKIERCOM  tereinala.  (The  final  variations  of  the 
theaia  coiqmter  work  were  acaled  up  to  handle  larger  aizea 
once  the  eaaential  coepariaona  and  teats  were  accoeplished. ) 

Once  the  basic  logic  of  the  new  sparse  solver  was  veri- 
fied* nodificationa  were  applied  to  teat  various  configura- 
tions of  solution  strategy  as  suggested  in  the  Hathenatical 
Theory  (Ch^ter  II )•  Finally*  once  an  optimiaun  algorithm 
waa  found,  a new  packing  scheaM  waa  incorporated  as  well  as 
other  modifications  to  make  the  program  execute  more  effi- 
ciently. 

Thesis  Preview 

The  text  of  this  report  contains  the  mathematical 
theory  required  to  understand  and  complete  the  project*  the 
descriptions  of  three  phases  of  tests*  the  method  for  choos- 
ing the  optimal  program*  and  a section  of  conclusions  and 
recommendations  suggested  by  the  thesis  work. 
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II.  Mathcmatic»>l  Theory 


To  und«rst&nd  the  criteria  for  choosing  the  best  Gatis- 
sian  sparse  matrix  algorithm,  it  is  necessary  to  first  con- 
sider the  algebraic  principles  used  for  linear  systems  as  if 
they  were  applied  to  an  ideal  problems  "perfect"  mathemati- 
cal accuracy  and  no  computational  limitations.  The  next 
consideration  is  the  effect  c>f  the  creation  and  propagation 
of  terrors  as  tiK3  mathematical  ideals  are  constrained  by 
practical  limitations.  Finally,  the  scope  of  the  problem  to 
be  solved  should  be  considered  to  decide  if  a particularly 
involved  solution  technique  is  really  required. 

The  mathematical  theory  discussed  will  therefore  cover 
the  three  principal  Gaussian  solution  methods,  the  causes  of 
errors,  some  strategies  which  -itteoqpt  to  minimise  the  ef- 
fects of  sot£«  err<.>is.  and  the  need  for  scaJiing  based  on  the 
context  of  the  proolem  to  oe  solved.  A summary  will  include 
some  qualitative  predictions  for  the  Gaussian  algorithms 
uncUr  comparison. 

gfMMiiin  Untir  gvgttwt 

Three  algorithms  used  to  solve  Eq  (1)  are  the  Gaussian 
Elimination,  the  LU  Decomposition,  and  the  Gauss-Jordan 
Reduction. 

fliyitlin  AXI  of  the  three  solution 

•cheaee  have  their  roots  in  Gaussian  Elimination.  In  the 
basic  form,  Gaussian  Elimination  is  a aeries  of  n forward 


operations  which  trsnsfoms  ^ into  *n  up^>cr- triangular  ma- 


trix S «Ax>s«  main  diagonal  elmants  ar«  unity;  then  in  the 
back  solution , is  computed.  The  terminology  used  to  de- 
scribe the  Gaussian  forward  process  is  as  follows t 


aj^j  ■ an  original  element  of 

(k> 

aJj  ■ the  value  of  a^^  ccMiq;>uted  during  the 
k-th  operation.'' 

Uj^j  at  an  clement  of  |||  or, 


b^  • an  original  element  of 


bj^  m the  vaaue  of  b^  coi^uted  during  the 
k-th  operation. 

m the  final  viJ.ue  for  bj|^. 

The  forward  operation  transform'^  fiq  (1)  into 


The  following  is  an  e2;ai^le  of  the  nomenclature  which 
describes  an  intermediate  step  in  the  forward  process i 


L ux2  ui3  «14 “In 


“23  “24  - - - W2n 


“1  PI 


(21  (2) 

•n3  “nd  • 


J 


The  sub-matrix  enclosed  by  the  heavy  line  is  referred 
to  as  the  ”k-th  derived  set'*;  in  the  preceding  cxaiaploy  the 
second  operation  has  just  been  done.  The  area  to  the  left 
of  the  diagonal  and  to  the  left  of  the  sub-matrix  is  strict- 
ly zero;  the  area  to  the  right  of  the  diagonal  and  above  the 
sub-uuitrix  is  the  partial  set  of  elements  of  U.  The  com- 
puter algorithm  for  the  k-th  derived  set  is 


(k)  (k-1) 

^ij  - ^ij 


k«l,2,.,.n-l 

^ik  ^ (k-1)  . 

• ^j  J«k^l,....n 

^ i*k+l,....n  (11) 


JlkICvX  p 2 f • • D 


(k-1) 


^(k-l) 

^ 

(k-1) 

•kk 


(k-1) 

(k-1) 


(k-1 


XslC^X  f • • e eFl 


(12) 


(k-1 ) 

The  term  is  the  diagonal  or  "pivot**  element  used  in 

the  k-th  operation.  The  back  solution  of  Eq  (9)  is  the  fol- 
lowing computer  algorithms 


jsial 


Uij  . Xj 


(13) 


dL«n-l  ( • • • 1 


IS 


Th«  preceding  applies  to  a dsnss  natrixf  for  a sparse 


■atrixy  however,  for  th  'se  eleeents,  *ij  t which  are  sero, 
no  tiM-consuaing  arithaetic  operation  is  necessary.  Pur« 


thexaore,  the  case  aay  arise  in  ivhich  a 


(k-1) 


is  sero  but 


(k) 

a^j  is  coaputed  to  be  non-sero.  ‘ntis  aanifestation  is 

called  "fill-in."  Additionally,  a sero  nay  appear  on  the 

diagonal)  appropriate  row  or  colusm  interchanges  can  be  utted 

to  prevent  a division  by  sero.  In  fact,  the  proper  choice 
(k-1) 

of  a^  aay  be  dictated  by  aany  criteria.  The  sparse 
Gaussian  Bliaination  and  a study  of  pivoting  strategies  is 
pjrograaaed  by  the  student  as  part  of  this  thesis. 

W LU  Decoo^sition  aethod  aakes 

use  of  the  "LU  Theorem"  (Ref  7 >27)  vdiich  states  that  the  aa- 
trix  ^ can  be  factored  into  two  unique  aatrices,  ^ and  Ut  L 
is  a lower- triangular  aatrix,  and  is  an  upper- triangular 
Bwitrix  whose  diagonal  elements  are  unity.  The  utility  of 
this  theorea  is  tliat  ^ and  can  be  deterained  without  ref- 
erence to  the  constant  vector,  Therefore,  once  ^ is  fac- 
tored, any  set  of  vectors  will  yield  iaeediate  unique 
solutions  for  the  corresponding  set  of 

The  factorisation  of  ^ into  ^ and  ^ represents  two  tri- 
angular systeas  (Ref  7*29) i 


flbe  » y 


Ijr  - b 


(1^>) 


(15) 


lb  use  a 


ixS^ 


mter  algoritha  to  factor  ^ and  to  solve  Bqs  C.^) 
can  use  the  following  procedure* 
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^ • ]l  (Giv«n)  (1) 

Prcaultiplication  of  hy  ^ n-by-n  identity  matrix  givtts 

^ « lb  (16) 

Instead  of  involving  b the  derived  sets*  as  in 

Bq  (13 1 algoriths  should  a|^ly  aritlmetic  operations  to 
the  identity  matrixt  multiplication  of  a row  by  a scalar 
should  be  carried  through  the  row  of  1.,  and  aanipulation  of 
elements  through  row  addition  should  create  new  elements  be- 
yond the  diagonal  of  X*  As  a result,  the  matrix  X wxll  be 
transformed  into  a general  matrix  G.  It  can  be  proved  that 
£ is  a lower  triangular  matrix.  Therefore,  Eq  (16)  becomes 

I2&  - Sb  (17) 

Premultiplication  of  both  sides  of  Bq  (17)  by  yields 

G"^Ua  - (G"^G)b  (18) 

which  further  reduces  to 

- Ife  - b (19) 

By  uniqueness  of  the  LU  Theorem,  one  therefore  concludes 
that 

a*^  • k (*o) 

Thus  the  computer  algorithm  really  solves  Bq  (I6)  as 

la  - (*i) 

Wten  b 1*  entered  into  the  computation,  Bqs  CUO  *nd  (15) 


^ (22) 

and 

2.  - (23) 
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Bqs  (22)  (23)  reduce  to  a re-stateraent  of  Eq  (9)  since  ^ 

is  identical  to  factorization  of  A and  the  solution 

for  ^ is  the  same  as  the  forward  Gaussian  Elimination  pro- 
cess | the  back  solutions  in  both  Gaussian  Elimination  and  LU 


Decoiqjosition  represent  the  same  procedure. 

Any  techniques  which  aid  the  forward  process  of  Gaussi- 
an Blinination  (such  as  row  and  column  interchanges)  can  be 


used  with  LU  Decomposition.  In  theory,  therefore,  Gaussian 
Elimination  and  LU  Decomposition  give  the  same  results  if 


the  same  pivoting  strategy  is  used. 

In  the  context  of  computer  operations,  he  LU  Deccmipo- 
sitioii  of  A can  be  stored  for  future  use  for  any  number  of 
particular  solutions  for  ic  given  any  These  subsequent 
solutions  represent  a considerable  savings  in  computer  time. 
However,  extra  space  must  be  provided  to  store  X it  grows 
into  g.  For  a "one-time”  solution  of  Eq  (l)  the  LU  Decom- 
position algorithm  may  not  be  appropriate. 

The  LU  Decomposition  sparse  matrix  solver  is  used  by 
Sherman  (Ref  14)  in  the  Yale  Sparse  Matrix  Package  (YSKP)c 

Reduction.  The  Gauss- Jordaun  Reduction  be- 
gins with  the  saaae  matrix  setup  ais  in  Gaussian  Elimination. . 
TB. substantiail  difference  is  that  in  the  k-th  operation, 
adl  elements  above  and  below  the  diagonal,  (in  the  k-th  col- 


uam)  are  eliminated.  The  computer  aULgorithm  for  the  sub- 
matrix of  4 computation  of  the  elements  of  are  the 

seaM  mm  Bqs  (U)  and  (12)  except  that  the  range  of  the  index  i 
is  from  1 to  n (Ref  13>400,401).  The  geoamtry  of  the  k-th 
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derived  set,  therefore,  is  not  a shrinking  square  sub>inatrix 

t 

which  collapses  about  the  diagonal  (Eq  [lO] ) but  rather  a rec- 
tangular sub-matrix  whose  width  collapses  from  left  to  right. 
The  following  is  the  nomenclature  for  the  second  derived  set 
of  Gauss-Jordan  Reductions 


1 

— 

“ - 
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1 

1 
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(2) 
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1 

^3 

e 

^3 
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1 
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J2) 

0 

0 

1 

1 

Si3 

^l4 

mm 

j>n  ^ 

(24) 


To  the  left  of  the  dotted  line,  an  identity  matrix,  X* 
taking  shape.  Thus,  only  one  forward  pass  of  n operations 
is  needed  to  solve  for  | s 


to  - fe*  (25) 

♦ 

(The  constant  vector,  in  Bq  [25]  is  not  the  same  final 
constant  vector  in  Bq  [9]*)  It  may  first  seem  that  Gauss- 
Jozdan  Reduction  is  the  most  efficient  way  to  deal  with  lin- 
ear systeswf  however,  for  a dense  matrix,  Gauss-Jordan  Reduc- 
tion requires  almost  50%  more  arithisetic  operatiohs  than 
Gaussian  Elimination  (Ref  13t40l). 

A Gauss-Jordan  algorittn  usually  takes  less  space  in  a 
computer  than  any  Gaussian  Elimination  program*  But  in  the 
solution  of  some  problems  by  Gauss-Jordan  Reduction,  the 
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csqponents  of  the  conputed  data  tend  to  growt  this  growth,  in 

r 

a large  system,  becg^^s  intolerable  even  in  the  best  digital 
coiqputer.  Pivoting  strategies  can  be  allied  to  Gauss- Jor- 
dan Reduction!  however,  the  effects  of  a particulao:  pivoting 
strategy  are  often  different  in  the  Gauss-Jordan  algorithm 
as  con^Mired  to  Gaussian  Elimination.  John  Key's  computer 
program  "SIMULT"  uses  the  Gauss-Jordan  algorithm  (Ref  9)* 


The  most  xmportant  consideratxon  xn  the  solutxon  of  a 
sparse  system  is  that  it  represents  an  approximation  of  sonie 
physical  system.  But  to  prop?‘r\y  analyze  the  errors  spawned 
in  the  sparse  computer  solution,  it  is  assumed  that  the  un- 
certainties in  the  given  elements  and  bj  are  zero  before 
the  operation  begins.  (The  uncertainty  of  an  arbitrary 
quantity,  u,  will  be  annotated  ais  "bvi,") 

The  kinds  of  errors  which  have  a direct  bearing  on  the 
solution  cf  sparse  systems  are  round-off  error,  truncation 
error,  instability,  and  fill-in  proliferation. 

Hound-off  Errors.  In  a t3rpical  digital  computer,  the 

product  or  quotient  of  a multiplicative  operation  a^spears  in 

• 

a double-length  accumulator*  Before  the  contents  of  that 
accumulator  are  stored  in  a data  location,  the  Ic^r  order 
digits  are  rounded  off.  For  floating  point  numbers  in  the 
OX  6d00  computer,  the  mantissa  of  a number  can  be  cosfwited 
with  accuracy  up  to  14  decimal  places  provided  no  other 
error  is  introduced* 


Truncation  Error.  Before  two  numbers  can  be  added  in  a 


computer,  the  smaller  number  must  be  right-shifted  so  that 
the  exponents  are  normalised.  If  two  numbers  whose  expo- 
nents differed  greatly  were  added,  the  lower  significant 
digits  of  the  smaller  number  would  be  lost;  the  accuracy 
once  contained  in  the  lost  digits  would  not  be  carried  over 
into  the  sum. 

One  may  infer  into  the  discussion  of  round-off  and  trun- 
cation errors  that  the  smallest  algebraic  error  in  a solu- 
tion scheme  may  be  obtained  by  minimizing  the  number  of  mul- 
tiplicative and  additiv<t  operations. 

Instability.  The  "instability”  in  the  solution  of  a 

system  is  a qualit«vtlve  measure  of  how  algebraic  errors  have 

i 

grown  to  the  detriment  (i;f  the  final  answer.  The  following 
exasq>le  shows  that  errors  from  unstable  systems  result  from 
the  type  of  algorithm  used  and  not  the  computer  itself. 

X 

z m — — (the  algorithm)  (26) 

y 


If  X m 1.0  and  6x  ■ O,  then  what  are  x and  6s  if 
two  values  of  y and  6y  ? 

Case  It  yx  ■ o.oioo,  6yx  ■ o.oooi 
8 ■ 100,  and  6s  • 2.00C2 
Case  lit  • 1.000,  6y2  ■ 0.001 
s ■ 1,  and  6«  ■ 0.002 


there  are 


(27) 

(28) 


In  Case  I,  s night  be  stored  as  97.9998  (about  a 2%  error), 
and  in  Case  11,  s mitfht  be  stored  as  0.998  (only  a 0.29  er- 
ror). Bven  though  6y2  ***  larger  than  6yxf  division  hy  the 
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taaXler  number  (yj^)  amplified  the  error  much  more  than  divi- 
sion by  the  larger  number  (y2)*  Even  though  truncation  and 
round-off  errors  themselves  are  on  the  order  of  10**^^ « a 

SMll  number  were  used  as  the  pivot  element,  the  result  ..ig 
cosiputation  could  contain  a substantial  net  error.  In  this 
regard,  the  algorithm  of  Bq  (26)  would  be  deemed  relatively 
unstable  if  it  chose  y^  and  relatively  stable  if  it  chose  y2* 
Accordingly,  an  algorithm  which  actively  seeks  the  larger 
numbers  for  pivot  elements  is  said  to  be  more  stable  than  an 
algorithm  which  ignores  the  relative  sizes  of  possible  pivot 
elesMnts. 

Sparse  Matrix  Fill-in  Errors.  In  a large  sparse  matrix, 

it  is  iaqjortant  to  store  only  the  non-zero  elements  of  if 

a fill-in  value  is  calculated,  there  must  be  room  available 

(k) 

to  store  the  new  a^j  . A little  fill-in  is  nonsally  accept- 
able, but  a large  amount  may  exceed  the  storage  capability 
of  a digital  computer.  More  importantly,  with  the  prolifer- 
ation of  fill-in,  the  algorithm  is  faced  with  many  more 
arithmetic  operations  (resulting  in  more  algebraic  uncer- 
tainty). Worse  yet,  in  some  problesM,  the  fill-in  values 
are  relatively  small  numbers,  and  the  possibility  exists 
that  this  kind  of  fill-in  may  become  pivot  elements.  Further 
errors  due  to  instability  may  result.  Thus  a choice  of  piv- 
ot elements  which  is:?.aimises  fill-in  may  reduce  error  grosrth. 

Msiqf  pivoting  strategies  have  been  developed  which  at- 
tempt to  resolve  these  types  of  errors. 


aa 


f W 
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A pivoting  strategy  is  a pxazt  of  a computer  algorithm 
idiich  chooses  an  element  a|j  to  be  the  new  a^  (the 
pivot  element)  based  on  some  desired  outcome.  A demonstra- 
tion for  the  need  for  strategic  pivoting  is  found  in  Appen- 
dix A.  The  following  are  three  examples  of  the  most  com- 
monly used  strategies  for  general  matrices. 

Diaopiial  Pivoting.  Diagonal  Pivoting  strategy  is  real- 
ly no  s'crategy  at  all.  Each  of  the  n operations  chooses  the 
diagonal  element  for  the  k-th  pivot  without  regard  for  the 
results  of  any  previous  operation.  Hence,  instability  is 
possible.  Moreover,  if  a zero  were  on  the  diagonal,  the 
coiqxtter  operation  would  halt  abruptly. 

Gaussian  Partial  Row_  Pi voting.  The  Partial  Row  strate- 
gy goes  through  the  rows  consecutively | the  largest  element 
in  the  pi'V'ot  row  is  selected  as  the  new  a^  . A colusin 
interchange  in  the  matrix  and  a re-arrangement  of  the  com- 
ponents of  are  necessary  to  get  the  pivot  element  onto  the 
diagonal.  The  advantages  of  Partial,  Row  pivoting  are  that 
such  an  algorithm  can  handle  any  tx>n-singular  matrix,  even 
if  a sero  were  to  4^ppear  on  the  diagonal,  and  that  numerical 
stability  is  enhanced  by  use  of  the  largest  element.  There 
is,  of  course,  the  requirement  for  extra  programming  for  the 
column  manipulation. 

Gaussian  Pull  Pivoting.  The  Pull  Pivoting  strategy 
searches  the  entire  submatrix  of  ^ for  the  element  with  the 
largest  absolute  value.  In  this  case,  a set  of  row  and 
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column  interchanges  may  be  necessary.  Full  Pivoting  is  con- 
sidered to  be  the  most  accurate  pivot  scheme  for  dense  ma- 
trices | however f not  only  must  additional  programming  be 
done  for  row  interchanges « but  considerable  execution  time 
will  be  spent  searching  the  remaining  subnatrix  to  find  the 
largest  value. 

The  preceding  pivoting  schemes  are  those  which  are 
classically  associated  with  dense  matrices;  while  pivoting 
for  stability  is  a good  idea«  conqplete  disregard  for  other 
factors  common  to  spaurse  matrices  can  lead  to  massive  errors. 
Popular  sparse  pivoting  strategies  are  generally  classified 
as  *a  priori”  or  "local”  strategies. 

A Priori  Strategy.  An  a priori  scheme  is  one  in  srtiich 
the  overall  strategy  for  the  selection  of  pivoting  has  been 
decided  for  the  entire  forward  process  before  any  operations 
are  done.  The  most  comsion  usage  of  an  a priori  pi^'oting 
strategy  is  the  case  where  a system  is  so  vast  that  it  can- 
not completely  reside  in  the  computer  core  and  mist  be 
stored  on  tape  or  disk.  The  rows  are  permuted  so  that  they 
appear  in  increasing  size.  The  pivot  can  be  chosen  as  the 
first  non-zero  element  of  the  row  (Mkely  to  be  the  diagonal 
element).  A priori  schesMs  can  be  used  for  some  special 
cases  where  the  entire  array  does  reside  in  core.* 

Local  Strategies.  A local  pivoting  strategy  checks  the 
present  status  of  the  remaining  sub-matrix  of  ^ before  the 
k-th  operation;  the  pivot  element  is  chosen  according  to  the 
dictates  of  the  strategy.  Local  strategies  are  better  than 
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* priori  strategies  in  preserving  sparsity  or  operation 
count  (Ref  6t505).  The  following  exanples  are  some  of  the 
more  popular  local  strategies. 

1.  HMCKarrii -Slrft t^gy. « The  Markowitz  procedure 

(k-1 ) 

chooses  the  pivot  element  ais  that  element  a^j  for  which 
the  product  of  the  number  of  non-zeros  in  the  coltunn  and  the 
number  of  non-zeros  in  the  row  is  a minimum.  To  scan  a 
large  submatrix  for  the  ai^ropriate  pivot  element  would  take 
considerable  time.  This  scheme  is  meant  to  minimize  the 
number  of  arithmetic  operations  and  immediate  fill-in.  At 
no  time,  however,  is  the  absolute  value  of  the  pivot  consid- 
ered for  numerical  stability. 

2.  Minimum  Row/Minimum  Column.  A scheme  which  is 

slightly  less  effective  than  the  Markowitz  strategy  but  more 

simple  is  the  Minimum  Row/Minimum  Column  technique.  The  as- 
(k-1) 

signment  of  a^  is  given  to  that  element  in  the  sousllest 
^lumn  of  the  smallest  row  in  the  remaining  sutmatrix. 

Again,  no  checks  are  made  for  numerical  stability. 

3.  HinilWIff  Row/Maxiamm  Bleownt.  A scheme  sioiilar  to 
Gaussian  Partial  Pivoting,  the  Minianim  Row/Maximum  Element 
technique  seeks  the  largest  element  of  the  smallest  row  in 
the  remaining  subsuitrix.  A compromise  has  been  made  between 
the  number  of  computations  and  stability. 

There  are  many  other  local  pivoting  stratec^ies  (Ref  i 
92,93)  which  have  been  tested}  as  with  these  and  all  cf  the 
previously  discussed  strategies,  a dilemma  arises.  Ideally, 
it  would  be  desirable  to  minimise  fill-in,  maximise  stability. 
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and  conputa  the  ninimun  number  of  calculations  as  necessary; 
however,  these  three  criteria  are  not  all  mutually  exclusive 
of  each  other.  For  example,  the  Markowitz  strategy  pivots 
for  coiqxitational  reduction  without  regard  to  stability; 

Full  Pivoting  acts  to  stabilize  without  regard  to  the  amount 
of  calculation  or  fill-in.  Figure  5 ahows  a philosophical 
view  of  the  dilemma.  If 
the  "cost**  of  one  criter- 
ion were  linked  with  the 
length  of  the  "line" 
joining  the  criterion  and 
the  actual  strategy  used, 
atttt^pting  to  shorten  one 
line  (to  improve  perform- 
ance in  that  respect) 
would  stretch  out  the  other  two,  and  hence  the  "cost"  would 
increase.  The  "cost"  would  be  measured  in  a rise  in  coaqput- 
er  time  or  performance  degradation. 

HtiWgiM  SlrattflY  once  any  scheme  is 

programmed,  it  may  be  of  interest  to  compute  the  scalair  re- 
sidual error  as  a performance  value.  An  algorithm  would  ccei- 
pute  this  value  as  the  average  error  per  equation,  R,  in  the 
follos^lcg  manner; 

R u £ abs(bj^  - b|)  (29) 

n i«l 

where,  if  ^ is  the  calculated  solution,  then 


Fig.  5.  Pivoting  Dilemma 
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It  is  ths  totsl  sffcct  of  th«  errors  vrtiich  gives  rise 
to  R|  when  discussing  the  theoretics!  upper  bounds  on  the 
errors,  it  is  helpful  to  define  sn  error  Bistrix,  6^  ss  that 
matrix  which,  when  added  to  yields  the  computed  value  of 
2Sc  ideal  computation  took  place. 


♦ 6A)»c  • ^ 


(31) 


There  is  no  doubt  that,  in  general.,  the  choice  of  pivoting 
strategy  has  ar  important  effect  on  the  size  of  more 

precisely,  the  Euclidian  Norm  of  6/i*  The  Euclidian  Norm  of 
any  matrix,  }||,  is  defined  as  (Ref  1^417 )t 


norm({i) 


-1 1/* 


R R 

2 2 ml. 

i.l  j«l 


(32) 


James  Bunch  adds  a new  wrinkle  to  the  pivoting  dilemma  for 
Gaussian  Elimination  i 

The  error  matrix  [5^  arising  fr^  performing  the 
elimination  process  in  finite  precision  depends  on 
the  fill-in  occurring  during  the  eliminationc  Vfe 
could  seek  an  ordering  of  equations  ;iO  that  the  bound 
on  [norm  (6/^)]  is  minimized.  This  would  not  be  equiv- 
alent to  the  seeking  of  an  ordering  to  minimise  fill- 
in.  Indeed,  we  see  that  minimizing  fill-in  helps  to 
keep  the  bound  on  [norm  (5A)]  from  becoming  too  large. 

The  problem  is  even  more  difficult  if  we  need  to  pivot 
for  stability  (Ref  2t873). 

Ainch  suggests  that  the  structure  of  the  matrix  has  a great 
deal  influence  on  the  net  error.  For  example,  if  the  a pri- 
ori pivoting  strategy  mentioned  on  page  24  were  used  with  a 
tightly  banded,  diagonally  doadnant  matrix,  one  would  msqpmct 
very  good  accuracy  and  little  fill-in.  By  the  structure  of 
the  matrix,  the  chosen  pivot  element  will  be  frost  the  set  of 
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Iar9«  nuBibftrs  on  th<t  diagonal,  and  there  will  be  relatively 
few  places  within  the  band  structure  to  allow  fill-in.  But 
the  sane  scheme  with  an  arbitrary  matrix  would  not  be  nearly 
as  successful.  In  fact,  the  derivation  by  Bunch  considers 
the  upper  bounds  for  norm  (64)  in  the  case  of  banded  ma- 
trices; for  ver>r  widely  banded  matrices  or  unbanded  matrices 
the  minimisation  of  fill-in  may  overshadowed  by  numerical 
instability.  In  any  case,  the  structure  of  the  system  to  be 
solved  and  the  desired  performance  influence  the  choice  of 
pivoting  strategy. 

Structure  and  pivoting  have  their  own  peculiar  effects 
on  Gauss-Jordan  Reduction.  The  strict  error  analysis  of 
Gauss-Jordan  Reduction  is  difficult;  in  Gaussian  Elimination 
the  study  of  error  can  be  represented  by  Bq  (31)  in  that  the 
system  (4  t 64)  represents  a "neighboring*'  system  of  4* 
other  words,  the  resulting  cosputed  solution,  2$^*  ^ 

"neighborhood"  of  the  true  solution  ^ as  specified  by  Eq  (1) 
But  in  Gauss-Jordan  Reduction,  it  is  difficult  to  prove  that 
Sq  is  always  in  a neighborhood  of  ^ (Ref  12t21);  in  the  con- 
text of  Bq  (31)f  the  system  actually  solved  is  not  strictly 
a neighboring  system  cf  4*  problems  associated  with 

Gauss-Jordan  Reduction  result  from  failure  to  control  the 
growth  of  the  elements  above  the  diagonal. 

The  Gauss-Jordan  algorithm  can  be  seen  as  a combination 
of  above  and  below  diagonal  elimination  which  yields  the 
Identity  matrix  in  Bq  (25).  The  below-diagonal  elimination 
is  identical  to  Gaussian  Elimination,  and  thus  the  errors 


fro*  th«s«  computations  srs  linitsd  to  thoss  which  arise 
from  Gaussian  Elimination.  As  for  the  above-diagonal  elim- 
ination, no  guarantee  can  be  made  for  any  system  which  ig- 
nores stability}  but  even  with  Partial  Row  pivoting,  the 
growth  of  the  above-diagonal  elements  may  be  arbitrarily 
large  (Ref  12i2l).  It  is  known  that  positive  definite  and 
diagonally  dominant  A siatrices  are  stable  with  Gauss-Jordan 
Reduction  with  Partial  Row  pivoting;  but  in  the  comparison 
phase  of  this  thesis,  it  should  be  emphasised  that  Key's 
“SIMULT"  program  uses  Hinimim  Row/Minimum  Column  pivoting 
vdtich  is  still  subject  to  numerical  instability. 

AlgoritiJns  for  scaling  are  used  to  isprove  the  "condi- 
tion” of  some  systems;  the  relative  condition  of  a system 
refers  to  two  factors t 1)  the  relative  magnitudes  of  neigh- 
boring elements  both  before  and  during  elimination,  auxl 
2)  the  uncertainty  with  which  each  of  the  original  a^j  and 
b^  were  i^roximated.  (Hsre'tofore,  6aj^j  and  were  as- 
sumed to  be  sero.)  If  ^ is  "well-conditioned,”  then  the  in- 

(O)  (O) 

herent  errors  fta^j  and  db^  will  not  be  amplified;  but  in 
an  "ill-conditioned"  system  even  a small  error  is  likely  to 
grow  past  acceptable  limits  (Ref  13i39d,397). 

Por  example,  the  situation  nay  arise  irtien  b^^  is  meas- 
ured in  milliwatts  and  b2  is  measured  in  killowatts;  the 
oorre^x>nding  aj^j  and  a2j  will  necessarily  be  out  of  propor- 
tion, Ho  pivoting  strategy  alone  could  be  stable  enough  to 
handle  this  sort  of  problem.  However,  the  rows  and  columns 
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of  A c&n  be  scaled  to  a more  «vorkable  size  relationship  such 
as  a row  or  column  norm  (Ref  IStlO). 

To  bring  neighboring  rows  into  line,  each  column  should 
be  divided  by  that  column  element  with  the  largest  absolute 
value;  the  matrix  which  scales  A this  way  is  a diagonal  ma- 
trix, whose  elements  are  the  reciprocals  of  those  maxi- 
mum column  elements . To  maintain  equivalence  of  Eq  ( 1 ) , 
postirtultiplication  by  is  required! 

h fii  “ h (33) 

Then,  to  align  the  columns,  a row  scaling  is  required;  the 
row  element  with  the  largest  absolute  value  is  divided  into 
the  row  and  corresponding  b^^.  The  scaling  matrix  is  another 
diagonal  matrix,  D^i  whose  elements  are  the  reciprocals  of 
these  row  scales.  The  solution  x of  Bq  (1)  is  the  same  as 
that  of  the  following  (Ref  15tll)t 

gj  A . Dgb  (34) 

Bq  (34)  reduces  to  the  final  form  of 


A'jc*  - b' 

(35) 

where 

£ - gl  £ 

(36) 

A’  - fiaABi 

(37) 

and 

fe'  - Bafe 

(38) 

Since  and  diagonal  matrices » their  storage  re- 

q^ire■ents  are  only  n locations  each  for  the  diagonals,  and 
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their  inverses  ere  essily  cslculeted. 

For  s progrsB  which  is  designed  to  be  an  ell«encompas» 
sing  sparse  natrix  solver,  a scaling  algorithn  should  be  an 
integral  part.  However,  if  problems  are  limited  to  those 
with  well -conditioned  systems,  scaling  need  not  be  used.  In 
fact,  scaling  would  require  extra  multiplications  for  each 
non-sero,  and  the  computer  search  for  the  scaling  elements 
would  be  time-consuming  (even  if  simple). 

Therefore,  if  some  regard  is  paid  to  numerical  stabili- 
ty in  the  solution  algorithm,  and  if  the  scope  of  the  prob- 
lems to  be  solved  is  reasonably  constrained,  no  scaling  al- 
gorithm is  really  needed. 

Based  on  the  preceding  discussions,  some  predictions 
for  the  Gaussian  sparse  solvers  can  be  made  as  a result  of 
the  mathematical  theory.  Short  analyses  of  pentadiagonal 
and  arbitrary  matrices  will  be  discussed  for  Gaussian  Elimi- 
nation and  Gauss- Jordan  Reduction,  (Gaussian  Blimi.nation 
and  LU  Decomposition  will  be  classified  together  since  they 
are  arithmetically  similar.) 

w>i»^<^nonal  Case.  A diagonally  dominant,  pentadiag- 
onal matrix  is  a common  problem  to  solve  in  nuclear  p^hysics. 
With  this  structure,  almost  any  a priori  or  local  pivoting 
strategy  in  Gaussian  Elimination  will  choose  pivots  consec- 
utively on  the  diagonal.  The  notable  exception  may  be  Pull 
Pivoting  for  which  no  guarantees  can  be  made.  Also,  in  the 
htntmum  Row,^tLninum  Colunn  scheme. 


it  ie  possible  that  two 


or  Bore  rows*  hav«  the  smallest  size  (rows  one  and  n,  for  ex- 
ample); but  most  algorithms  usually  have  "tie  breaking" 
rules  which  choose  the  first  element  to  meet  the  criteria  of 
the  strategy  as  the  pivoc.  In  this  pentadiagonal  case, 
there  should  be  no  fill-in  and  the  accuracy  should  be  very 
good. 

On  the  oth«r  hand«  the  Gauss-Jordam  Reduction  with  the 
Minisnm  Row/Minimum  Column  pivoting  strategy  will  fill  in 
greatly  with  a pentadiagonal  matrix.  (This  strategy  is  that 
of  the  SIMULT  program  to  be  coBqpared  in  this  thesis.  ) The 
fill-in  of  at  least  two  values  per  row  (in  columns  four  and 
five)  will  occur  in  all  but  the  first  two  and  last  tuvo  de- 
rived sets  (Pig.  6).  Purthermore , these  fill-in  values  will 
have  been  calculated  using  previous  fill-in.  And  lastly, 
the  final  pivot  operations  will  be  in  colusuis  four  aund  five 
and  thus  are  bound  to  yield  significant  errors. 

Arbitrary  Caise.  Gaussian  Elimination  will  show  a wide 
rauige  of  performance  with  different  pivoting  schemes.  Por 
example,  Pull  Pivoting  could  easily  choose  a pivot  in  the 
largest  row  and  create  vast  asxmnts  of  fill-in.  Even  Mini- 
mum Ror/Minisnia  Column  could  "Juiqp"  around  the  smitrix  for  a. 
proper  pivot;  as  a result,  even  though  imaMdiate  fill-in  is 
localised  and  small  in  amount,  it  would,  in  fact;  remain  to 
be  used  repeatedly.  Thus  fill-in  could  enter  into  many  cal- 
culations and  perhaps  mvmn  become  a pivot  later  on. 

Conversely,  in  Key’s  Gauss- Jordan  Reduction,  not  only 
will  the  immediate  fill-in  be  minimised t but  also  the  fill-in 
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ii  likely  to  be  eliminated  soon  after  its  creation;  thus 
even  with  large  amounts  of  fill>in,  the  errors  are  less 
likely  to  cascade  as  badly  as  in  Gaussian  Elimination. 

Therefore » it  is  predicted  that  Key's  program  will  out- 
perform Gaussian  Elimination  on  very  unstructured  matrices, 
idiile  Gaussian  Elimination  proves  to  be  more  effective  on 
■ore  structured  systems.  At  sooie  "degree"  of  randomness, 
both  performances  should  be  coii^>arable.  Also,  in  the  study 
of  Gaussian  Elimination  with  pivoting,  good  accuracy  may  be 
attained  by  strategies  which  either  eliminate  fill-in  short- 
ly after  its  inception  or  localize  fill-in  so  that  it  is  not 
involved  with  too  many  subsequent  calculations;  this  claim 
should  hold  true  even  in  cases  with  large  ainoun':s  of  fill-in. 

T 

Finally,  the  t3rpe  of  problem  vdiich  the  user  has  iu  mind  will 
be  the  guiding  force  in  choosing  the  alg  rithm  and  pivot 
strategy. 


III.  j.apnff 


The  testing  of  three  major  Gaussian  algorithms  was  done 
on  GDC  6600  computers  at  the  Air  Force  Weapons  Lab  (AFWL) 
and  the  Air  Force  Institute  of  Technology  (AFIT).  Standard 
main  programs  were  used  for  all  three  algorithms  to  accomo 
plish  the  following  functi  kSi  packing  the  sparse  matrix 
into  an  appropriate  form,  executing  and  timing  the  particular 
Gaussian  algorithm,  computing  an  average  scalar  residual 
error,  and  printing  out  the  solution. 

The  coiq>arisons  of  these  programs  contain  a ci^sule 
description  of  each  sparse  solver,  the  initial  testing  pro- 
cedure, and  a summary  which  suggests  the  direction  of  further 
study. 

The  names  of  the  programs  under  study  are  the  Yale 
Soars#  Matrix  Package  (YSMP),  by  Shermani  the  "SIMULT”  pro- 
gram, by  John  B.  Keyi  and  the  "MFF"  study  by  this  student. 

While  Sherman's  YSI'IP  program  contains  many 
FORTRAN  subroutines  for  solV|^ng  various  special  t>pes  of 
sparse  matrices  (synmetric,  for  example),  only  those  sub- 
routines needed  for  general  sparse  matrices  were  coo^jared. 

The  YSMP  uses  an  a priori  pivoting  scheme  for  LU  Deconposi- 
tiont  a permutation  array  is  generated  to  order  both  the  rows 
and  the  columns  of  ^ for  pivoting  (Ref  l4il5).  The  packing 
schsms  is  similar  to  that  suggested  by  Gustavson 
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(Ref  8i43,44)t  only  the  non-zero  elements  of  A are  packed, 
and  a roe  and  column  pointer  table  computes  the  "address"  of 
the  desired  a^j  for  future  computation. 

SIKULT.  Key's  SIMULT  program  uses  Gauss-Jordan  Reduc- 
tion ^ith  Minimum  Row/Minimum  Column  pivoting.  The  calling 
program  must  supply  an  important  data  value  called  "2rrEST"| 
during  the  calculations,  if  the  magnitude  of  a result  is  com- 
puted to  be  less  than  ZTEST,  it  is  automatically  set  to  zero. 
The  packing  scheme  uses  a compressed,  two-dimensional  FORTRAN 
array  for  the  maximum  number  of  allowed  non-zeros  per  row 
is  determined  by  the  user.  (Key  recommends  no  more  than  20 
to  30  elements  per  row  as  adequate  to  handle  fill-in. ) A 
siadlarly  structured  two-dimensional  pointer  array  stores  the 
"J"  column  coordinates  of  the  corresponding  ^ values 
(Ref  ftlO). 

MFP.  The  MFP  program  is  a study  of  Gaussian  Elimination 
with  various  pivot  strategies.  Tl>e  packing  scheme  is  identi- 
cal to  that  used  in  YSMP.  The  following  variations  used  the 
indicated  pivot  strategies  in  the  course  of  the  algorithm 
construction  and  the  initisd  testings 

MTl  - Diagonal 

MPP2  - Row  Partial  Pivoting 

MPP3  - Gaussian  Pull  Pivotir;; 

MPP4  - Minimum  Row/Minimum  Column 
MTPS  - Minimum  Row/Maximum  Element. 

Instead  of  re-sluif fling  the  rows  and  columns  for  pivoting, 
the  program  stored  the  order  of  row  and  column  pivot  coordi- 
nates into  two  arrays  called  IPZV  and  JPTV.  These  arrays 


w«rc  then  passed  to  the  back  solution  subroutine  to  properly 
cosqpute  2i*  resulting  nuLtrix  U may  not  have  appeared  to 

be  upper- triangular}  but  if  the  row  and  column  interchanges 
were  done  as  prescribed  by  IPIV  and  JPIV,  would  have  indeed 
i4>peared  as  upper- triangular.  (Appendix  B contains  the  flow 
charts  for  the  most  io^rtant  subroutines  in  MPP. ) 

Testing  Procedures 

The  criteria  for  the  initial  testing  of  the  three  Gauss- 
ian sparse  solvers  were  the  required  program  space*  the  or- 
ders of  magnitude  of  the  scalar  residual  errors*  and  the  exe- 
cution time  for  four  standard  matrix  problems. 

Program  space.  Table  I contains  a suosnary  of  ^;^ndix  C; 
this  comparison  lists  the  storage  space  required  for  the 
Gaussian  algorithms  excluding  the  main  programs  and  the  FOR- 
TRAN system  routines.  All  of  the  variations  of  MPP  are  in- 
cluded. 

Table  I 

Essential  Programming  Spa^CB  for  Sparse  Solvers 


Program  Length  (Octal  )i  Program  Data 


YSMP 

2134 

20207 

SZHULT 

532 

17662 

NFPl  - Diagonal 

757 

16666 

mF2  - Partial 

763 

16666 

MPP3  - Full 

1034 

16666 

MnP4  - Min  Row/Min  Col. 

1335 

16666 

NPP5  - Min  Row/Msx  Ble. 

1145 

16666 

( 
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The  data  storage  requirements  were  set  as  that  space  nvsces- 
sary  to  solve  any  lOO-by-100  system  at  5%  sparsity. 

Standard  Test  Results.  Four  test  matrices  (Appendix  D) 
were  run  in  each  configuration}  all  tests  of  YSMP  were  done 
at  AFWL  and  tests  of  SIMULT  and  MPP  were  done  at  APIT.  The 
average  scalar  residual  error  was  calculated  in  each  routine 
as  in  Bq  (29),  and  the  "TIMER**  function  (Ref  10)  provided  the 
time  required  to  execute  only  that  portion  of  the  programs 
that  called  the  Gaussian  solvers.  As  a result,  each  Gauss» 
ian  solver  was  examined  truly  independently.  The  first 
phase  of  the  conpajtrison  is  listed  in  Table  II. 

Table  II 

Initial  Ccxoparisons  of  Sparse  Solvers 

Test 


Matrix 

YSMP 

SIMULT* 

MFPl 

MFP2 

MFP3 

MFP4 

MPP5 

#1 

Errors 

Timet 

10-1^ 

0.12 

10+13 

0.28 

lo-i'^ 

0.48 

10-14 

0.48 

10-1 

2.67 

10-14 

1.31 

10-14 

0.58 

02 

Brrox’t 

Times 

lOO 

0.11 

failed 

10+1 

0.47 

10+1 

0.65 

10+11 

0.93 

10*1 

0.55 

10+1 

0.74 

#3 

Errors 

Timet 

io“® 

0.11 

10"® 

0.33 

10“® 

0.47 

10-® 

0.60 

10-3 

1.24 

10*® 

0.55 

10-® 

1.32 

#4 

Errors 

Timet 

10-1^ 

0.06 

10*3® 

0.26 

10-14 

0.29 

10-14 

0.29 

10-14 

0.68 

10-14 

0.88 

10-14 

0.37 

Error  calculated  as  in  Bq  (29). 
Time  measured  in  seconds. 

*ZIB8T  for  SIMULT  runs  ■ 10*^^. 


The  first  phase  of  testing  confirmed  two  important  aspects 
I of  the  theory  sectiont 

1.  LU  Decomposition  and  Gaussian  Elimination  can 
yield  comparable  accuracy. 

2.  The  GausS'Jordan  Reduction  in  SIMULT  performed 
poorly  for  very  structured  matrices. 

Test  matrix  #2  is  a near-singular  matrix;  both  YSMP  and  MFP 
solved  it,  although  badly.  But  SIMULT  determined  the  matrix 
to  be  singular;  this  is  due  to  one  or  more  critical  elements 
being  computed  to  be  less  thaui  ZTEST.  As  a result,  some  im- 
portant non-zero  data  was  cast  aside  resulting  in  a singu- 
larity. In  any  case,  test  matrix  #2  was  a bad  test,  and  no 
' her  cr'nclusions  should  be  drawn  from  its  results. 

As  predicted,  LU  Decomposition  and  Gaussian  Elimination 
always  gave  the  same  order  of  accuracy  (except  for  the  Gauss 

{ 

ian  Full  Pivoting).  Interestingly,  all  of  the  pivot  strate- 
gies of  MFP  (except  for  Full  Pivoting)  chose  pivot  elements 
consecutivelv  '^n  the  diagonals.  It  appears  that  the  YSMP 
probuL^iy  chc  .»e  the  diagonal;  most  a priori  strategies  would 
choose  the  diagonal  for  a pentadiagonal  matrix.  Additional- 
ly, the  actual  lues  of  the  errors  came  very  close  to  those 
of  MPP  which  urd  use  the  diagonal.  There  is,  however,  a dis 
parity  in  the  time  criterion. 

The  LU  Decomposition  should  have  takon  more  time  theui 
Gaussian  Elimination;  but  a check  of  the  program  structures 
would  esqilain  part  of  this  disparity.  Most  of  the  repeated 
operations  of  the  MFP  pivot  subroutines  are  contained  in 
( , other  individual  subroutines;  each  call  to  a FORTRAN 
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subroutine  requires  more  time  than  a simple  ”GO  TO 
statement.  The  call  to  subroutine  causes  a transfer  of  con- 
trol from  the  calling  program  to  the  computer's  operating 
system  in  order  to  find  the  subroutine,  execute  it,  and  re- 
turn to  the  calling  program.  The  pivot  subroutines  in  MFP 
must  frequently  use  some  external  programs  called  FETCH, 
DELETE,  ROMDIV,  and  STORE  which  manipulate  data  in  the  com- 
pacted form  of  the  sparse  matrix  (Appendix  B).  YSMP,  on  the 
other  hand,  is  built  so  that  all  of  the  necessary  progreonming 
for  a specified  step  is  contained  within  the  entire  subrou- 
tine (Ref  14x18) I 

SORDER  - Computes  minimum  ordering. 

NSRORD  - Re-orders  A given  the  ordering 
from  SORDER. 

SSFAC  - Computes  the  symbolic  factor- 
ization of  the  re-ordered  A 
matrix. 

NSNFAC  - Computes  the  numeric  factor- 
ization of  A,  given  its  sym- 
bolic factorization. 

NSBSLV  - Solves  Eq  (l)  given  the  LU 
factorization  of  A. 

None  of  these  subroutines  needs  to  communicate  with  any 
others I they  merely  must  be  executed  in  a prescribed  se- 
quence. Finally,  an  a priori  pivot  scheme  is  basically 
faster  than  a local  pivoter  when  dealing  with  pentadiagonal 
or  tridiagonal  matrices.  Of  course,  the  core  usage  is 
laurger  than  MFP  because  of  these  speed  capabilities. 

S.Wy.  AWff  ;(og  Ny?ct 

One  of  the  important  advantages  of  MFP  is  that  it  was 
easy  to  build  and  test  new  pivoting  schemes  by  using  the 
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same  modularized  subprograms  for  routine  operations.  There- 
fore, the  relative  merits  of  particular  Gaussian  Elimination 
pivot  strategies  could  be  easily  evaluated.  Also,  factors 
such  as  fill-in  and  number  of  deletions  could  be  used  to 
check  the  effective  use  of  data  storage  available.  Since 
SIMULT  similarly  offered  fill-in  and  deletion  monitoring, 
the  Gauss-Jordan  algorithm  by  Key  was  included  in  all  tests 
of  the  MFP  variations. 

Therefore,  the  objectives  of  the  next  phase  were  to 
compare  SIMULT  and  MFP  v;ith  more  arbitrary  matrices  to  find 
the  "performance  crossover  point"  (as  suggested  by  the  Math- 
ematical Theory)  and  to  find  the  optimum  version  of  MFP 
which  not  only  performs  well  but  meets  a prospective  user's 
needs . 
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IV.  Th^  Choice  th<t  Optimuin  Pivot  Strategy 


The  initial  test  phase  confirmed  the  progranning  logic 
for  the  three  major  Gaussian  algorithms  using  standard  test 
matrices!  the  next  phase  used  matrices  which  were  arbitrary 
both  in  value  and  structure.  The  range  of  structures  in- 
cluded some  systems  which  are  ts^pical  problems  in  physics 
and  engineering.  Thus»  the  tests  results  and  the  choice  of 
an  optimum  pivot  strategy  for  Gaussian  Elimination  come  as  a 
result  of  the  solutions  of  practical  problems. 

TtM  di&icussion  of  the  intermediate  test  phase  includes 
a description  of  the  test  matrices,  an  analysis  of  the  re- 
sults with  xespect  to  the  Mthematical  theory,  and  the  pro- 
cess hy  which  the  final  version  of  the  HPP  program  (Gaussian 
Blimination)  was  developed. 

gRtmildAitt  111.1 

All  of  the  next  eight  test  matrices  started  with  ran- 
domly-generated numbers  in  a tridiagonal  structure.  About 
3311  of  any  particular  set  of  numbers  were  negative.  The 
last  five  matrices  contained  an  additional  2%  non-sero 
structure  whose  values  were  randomly  generated;  the  coor- 
^nates  for  these  extra  values  were  also  randomly  determined,. 
This  additional  structure  was  contained  within  bandwidths 
which  normally  ranged  from  ±5  to  diagonals  from  the  main 
diagonal.  The  last  matrix,  however,  had  its  extra  non-sero 
structure  scattered  throughout  the  entire  available  array. 
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Th«s«  t«st  n*tric«s  mmxm  uft«d  to  oxorcis*  the  v&ria- 
tions  of  the  Gaussian  Bliwination  program  (MFP)  and  the 
Gauss- Jordan  Reduction  program  (SIMULT).  The  diagonal  pi vox 
strategy f HFPl,  was  not  used  in  the  intermediate  phase i as 
the  mathematical  theory  pointed  out,  diagonal  pivoting  strat- 
egy is  really  no  strategy  at  all,  and  the  chance  exists  that 
a scro  would  be  found  on  a diagonal  location.  (The  original 
purpose  of  MPPl  was  merely  to  be  the  basic  framework  for  ,^the 
other  pivoting  strategies.)  ^ 

The  enumeration  of  the  test  matrices  and  Itheir  results 
with  SIMULT  and  four  variations  of  NPP  are  contained  in 
Appendix  B. 

of  thft  Peaults 

The  data  which  was  available  from  the  MFP  strategies 
and  the  SIMULT  program  established  three  performance  crite- 
riai  the  order  of  magnitude  of  the  error  (as  calculated  ac- 
cording to  Bq  [29])f  the  niiaber  of  tisws  in  srtiich  a fill-in 
value  was  manifested,  and  the  execution  time  for  the  Gauss- 
ian algorithm. 

Error  Magnitudes.  The  most  consistent  perfonsance  was 
achieved  by  the  Gaussian  Partial  Pivoting  strategy  (MPP2) 
with  an  error  magnitude  on  the  order  of  10'*’^  or  less 
{Table  III).  With  Minimum  Row/)ti.nimun  Colusm  (MPP4),  Mini- 
mum Row/Naximum  Element  (MPP5),  and  SIMULT,  the  error  magni- 
tudes were  functions  of  the  degree  of  "scatter”  of  the  extra 
non-serosi  IVP4  and  MPP5  (which  were  meant  to  reduce  local 
fill-in)  did  work  well  with  the  more  tightly  banded  matrices. 
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Table  III 

Error  Magnitudeii  for  Intermediate  Tests 


Test 

3>latrix 

SIMULT 

MFP2 

MPP3 

MPP4 

MFP5 

# 5 

10“* 

V 

1 

o 

10*^ 

V 

1 

o 

w 

1 

o 

# 6 

io“^ 

10“^ 

w 

H 

1 

o 

H 

10"!^ 

# 7 

10‘1 

in 

1 

o 

lO^^ 

10-13 

1 

o 

-4 

# 8 

10“« 

10“ 

loO 

lO’l 

10-1 

# 9 

10“® 

CM 

1 

o 

10® 

10® 

10® 

#10 

10“® 

10“13 

loO 

lO*^ 

10^1 

#11 

0 

1 

o 

10“i3 

loO 

failed 

10® 

#12 

1 

o 

-4 

10“^® 

10^^ 

lO^l 

10® 

but  did  poorly  with  increasing  disorder  in  the  extra  non- 
sero  structure;  SIMJLT*  on  the  other  hand^  clearly  iBqprc*"d 
froa  10“  to  lO***  with  wore  arVitrary  stojctui’e.  These  ob- 
servations clearly  confiiused  the  predictions  aade  in  the 
Nstbsaatical  Theory,  The  failure  of  MPP4  with  test  eatrix  #11 
was  due  to  a conputer  diagnostic  which  stated  that  an  "infi- 
nite operand"  had  been  chosen.  Since  Miniaua  Row/Miniaua 
Coluan  pivots  without  regard  to  stability,  this  result  is 
not  surprising.  As  for  Pull  Pivoting  (MPP3),  the  error  sag- 
nitudes  were  generally  poor;  this  perforaance  caae  chiefly 
as  a result  of  excess  fill-in. 

Pill-in.  With  the  exception  of  test  natrix  #12  (the 
least  organ! stsd  structure).  Pull  Pivoting  always  created  the 
host  fill-in;  furthermore » the  fill-in  excess  was  generally 


two  or  thr««  tioMS  *s  nuch  for  tho  other  Gaussian  Blioiina- 
tion  strategies  (Table  IV). 


Table  IV 

Pill-in  Tabulations  for  Intermediate  Tests 


Test 

Matrix 

SIMULT 

MPP2 

MPP3 

MPP4 

MPP5 

« 5 

97 

59 

232 

0 

59 

# 6 

97 

61 

233 

0 

61 

# 7 

97 

58 

218 

0 

58 

# 8 

336 

291 

1149 

207 

249 

# 9 

987 

699 

2002 

792 

629 

#10 

871 

930 

2153 

951 

932 

#11 

1247 

1044 

2129 

failed 

717 

#12 

2168 

4038 

3815 

2117 

1713 

Por  tightly-banded  matrices,  the  strategies  which  piv- 
oted for  fill-in  minimisation  did,  in  fact,  fill  in  fewer 
values  than  the  rest  of  the  strategies | however,  for  more 
scattered  structures,  the  reduction  in  local  fill-in  made 
little  differences  that  same  locsd  fill-in  did  coste  into 
play  in  many  more  calculations  to  manifest  further  fill-in} 
and,  as  with  the  "infinite  operand"  case,  some  values  did 
become  subsequent,  unstable  pivot  elements. 

There  ms  a feature  in  the  main  program  which  would 
list  the  pivot  ordering.  The  NPP4  and  MPP5  programs  often 
"jumped"  around  the  matrix  in  successive  pivotss  row  1,  then 
scow  loot  iraw  3,  then  row  89,  for  exaifstle.  As  a result,  the 
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fill-in  lingered  for  many  calculations  and  contributed  to 
errors  many  more  tiroes  than  did  the  fill-in  %om  the  Partial 
Pivoting.  For  example t in  the  case  of  test  matrix  #12,  in 
frtiich  Gaussian  Partial  Pivoting  registered  the  greatest 
aBK>unt  of  fill-in,  the  new  non-zeros  were  localized  about 
the  pivot  elements,  often  eliminated  soon  after  creation, 
and  thus  were  not  involved  in  as  many  subsequent  calcula- 
tions. This  observation  is  in  direct  agreement  with  the 
Mathematical  Theory. 

Execution  Time.  The  slowest  of  the  MFP  routines  was 
always  Pull  Pivoting  (MPP3)  because  of  the  large  number  of 
extra  computation  required  for  the  fill-in  and  the  normally 
time-consuming  searches  for  pivot  elements.  In  ine  tightly- 
banded  cases.  Minimum  Row/Minimum  Column  (MFP4)  chose  ele- 
ments consecutively  on  the  diagonail  for  pivoting,  and  thus 
computed  very  rapidly}  siisilarly.  Minimum  Row/Maximum  Ele- 
ment (MFP5)  chose  the  same  pivots  as  Partial  Pivoting  (MFP2). 
However,  in  general  no  Gaussian  Elimination  variation  ran 
significantly  faster  than  Partial  Pivoting  (Table  V). 

It  is  interesting  to  note  that,  while  time  c<MQparisons 
between  Gauss-Jordan  and  Gaussian  Elimination  are  really  not 
meaningful  from  an  algorithmic  atandpoint,  in  case  of 
tost  matrix  #12,  only  two  orders  of  magnitude  of ‘accuracy 
separated  SIMULT  and  MPP2|  yet  SIMULT  solved  the  matrix 
nearly  twelve  times  as  fast  as  MPP2.  Test  matrix  #12  is  a 
very  uncommon  problem}  but  at  some  point,  the  potential  user 
must  decide  which  sparse  solver  he  must  choose  in  light  of 
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Table  V 

Execution  Times  for  Intermediate  Tests 


Test 

Matrix 

SIMULT 

MFP2 

MPP3 

MFP4 

MFP5 

# 5 

0.26 

0.36 

0.73 

0.37 

0.44 

# 6 

0.27 

0.39 

0.76 

0.37 

0.45 

# 7 

0.30 

0.36 

0.74 

0.43 

0.45 

# 8 

0.33 

1.02 

3.40 

1.00 

1.03 

# 9 

0.66 

1.82 

7.77 

2.38 

1.90 

#10 

0.64 

2.19 

7.79 

3.06 

2.79 

#11 

0.89 

2.46 

9.17 

failed 

2.10 

#12 

1.70 

21.11 

26.82 

9.55 

6.35 

Time  in 

seconds . 

the  relative  disarray  of  his  own  problem. 


Choice  of  Optimum  Algorithm 

The  choice  of  the  Optimum  Gaussian  Elimination  Algo- 
rithm was  derived  from  the  preceding  analysis  the  conclu- 
sions of  which  are  recapitulated  below t 

1.  The  best  accuracy  consistently  came  from 
Gaussian  Partial  Pivoting. 

Disregard  for  stability  in  some  problems 
led  to  poor  accuracy  and  at  least  once  case 
of  division  by  a small  n*iaber  ("infinite 
operand" ) . 

3.  By  localising  all  pivot  choices  (as  in  Partial 
Pivoting),  the  fill-in  is  also  localised  and 
its  corresponding  error  affects  many  fewer 
sttbsoquent  calculations. 
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4.  LArg«  amounts  of  fill-in  still  may  be  an 
isiportant  source  of  error. 

These  conclusions  suggested  the  following  criteria  for  an 
"ideal"  pivot  strategy t 

1.  The  pivot  choice  should  be  from  consecutive 
rows I this  choice  would  help  to  localise  the 
effects  of  fill-in. 

2.  The  column  choice  for  that  pivot  row  should 
initially  attempt  to  minimize  the  number  of 
calculations  and,  hence,  lessen  the  proba- 
bility for  fill-in  occurrence. 

3.  If,  however,  the  value  of  the  pivot  is  very 
SBiiill  with  respect  to  some  number  (called  a 
"Pivot  Tolerance" ) the  element  with  the  lar- 
gest absolute  value  in  the  pivot  row  should 
be  used  as  the  pivot.  This  choice  need  only 
occur  often  enough  to  stabilize  the  system 
when  instability  insidiously  appears. 

As  the  dilemma  of  Pig.  5 indicates,  even  these  ideals  will 
not  yield  a panaceai  however,  they  indeed  provide  adequate 
grounds  for  engineering  tradeoffs  among  the  criteria  of  ac- 
curacy, fill-in,  and  time. 

Therefore,  none  of  the  original  MFF  variations  was 
chosen  as  the  optimum  strategy;  another  strategy  was  devel- 
oped, programmed  and  tested.  This  strategy  was  called 
"Consecutively  Calculated"  Pivoting.  (The  designation  of 
this  strategy  is  "MPPTH,"  sukS  the  subroutine  nane  for  the 
Forward  Gaussian  step  is  called  "IHINKFJt.") 

The  strategy  of  MFPIH  is  that  suggested  above t the  row 
pivot  coordinates  go  consecutively  from  1 to  n,  and  the  col- 
umn coordinate  is  chosen  as  that  eleswnt  in  the  column  with 
ttie  fewest  number  of  non-seros.  However,  if  the  value  of 
the  pivot  is  less  tirian  the  pivot  tolere:t3e  (called  "PIVTOL") 


then  a search  is  made  to  find  the  row  element  with  the  lar- 
gest absolute  value.  The  advantages  of  the  strategy  are 
very  important  to  the  userj 

1.  PIVTOL  can  be  a readable  quantity  (as  is  the 
case  in  the  listed  program  of  MFPTH  in  the 
APIT  Computer  Archive). 

2.  If  the  user  is  willing  to  sacrifice  some  ac- 
curacy, PIVTOL  could  be  chosen  to  be  a small 
number  (0.01,  for  example)  and  the  fill-in 
and  number  of  calculations  would  be  less  than 
those  for  Gaussian  Partial  Pivoting. 

3.  On  the  other  hand,  if  fill-in  is  not  a prob- 
lem, choice  of  a large  value  for  PIVTOL  (100, 
for  example)  would  always  be  driving  the  sys- 
tem towards  more  stability. 

In  fact,  with  large  values  of  PIVTOL,  the  algorithm,  for  all 
intents  and  purposes,  is  the  same  as  Gaussian  Partial  Pivot- 
ing, There  is,  however,  one  important  disadvantage.  Use  of 
a large  PIVTOL  would  force  more  second  searches  for  the  piv- 
ot element;  the  user  must  therefore  be  willing  to  pay  the 
price  of  extra  time  for  extra  accuracy. 

The  variation  MFPTH  was  programmed  and  run;  Table  VI 
shows  its  performance  with  all  test  matrices  as  well  as  its 
core  usage  information. 

The  increase  in  time  is  apparent,  but  not  formidable. 

It  is  clear  that  the  MFPTH  variation  is  a desirable  program 
because  the  user  has  an  input  into  the  ultimate  performance 
for  his  particular'  problem. 

The  next  testing  phase  narrowed  the  scope  of  op>cration 
to  problems  likely  to  be  solved  in  physics;  the  study  also 
gave  rise  to  yet  another  concept  for  the  final  form  of  the 
¥BT  Sparse  Matrix  Solver. 
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Table  VI 

Results  of  the  Consecutively  Calculated  Strategy 


Test 

Matrix 

Error 

Pill-ins 

Deletions 

Tine 

( Seconds ) 

# 1 

10-12 

0 

297 

1.08 

# 2 

10^1 

0 

297 

1.06 

# 3 

10-6 

0 

297 

1.07 

# 4 

CM 

1 

O 

0 

199 

0.71 

# 5 

10-1* 

0 

199 

0.73 

# 

4* 

1 

O 

0 

199 

0.78 

# 7 

10-13 

0 

199 

0.75 

# 8 

10-» 

116 

337 

1.54 

# 9 

10"^ 

455 

474 

2.84 

#10 

10"® 

620 

570 

3.58 

#11 

0 

1 

o 

•H 

715 

602 

4.08 

#12 

10”® 

2686 

1103 

26,56 

PZVTOL  - 

0.01 1 THINKER  Prograa  length  - 

12C2  (Octal) 

SO 


V.  Th<  Final  Algorithm  Compari sop 


Th«  final  phase  of  testing  compared  the  YSMP,  SIMULT, 
and  MFPTH  sparse  matrix  solvers  with  eight  more  test  ma- 
trices! the  structures  of  these  matrices  were  similar  to  the 
exaiq>le  presented  in  Chapter  It  diagonally  dominant,  tridi- 
agonal core  structure  with  flanking  diagonals.  The  analysis 
of  these  tests  also  spasmed  a new  feature  for  the  MPP  pro- 
gram to  improve  speed.  The  discussion  of  the  final  phase, 
therefore,  describes  the  new  teste,  presents  the  new  features 
for  MPP,  tabulates  the  speed  improvements,  and  sums  up  the 
overall  performance  of  the  three  major  sparse  algorithms. 

The  Final  Eight  Test  Matrices 

The  study  of  this  final  test  phase  concentrated  on  the 
performance  of  the  si>arse  solvers  with  matrices  whose  flank- 
ing diagonals  were  originally  located  adjacent  to  the  core 
(as  in  a pentadiagonal  structure)  and  then  displaced  one 
diagonal  at  a tisMi.  (Appendix  P contains  a listing  of  these 
matrices.)  The  pivot  tolerance  for  MFPTH  was  chosen  to  be 
10  so  as  to  pivot  for  accuracy.  Table  VII  summarises  the 
error  magnitudes  and  execution  tisMs  for  the  three  programs. 

The  obvious  result  of  these  tests  is  the  consistent  ac- 
curacy provided  by  the  MFPTH  program » however,  because  the 
pivot  tolerance  was  large,  the  time  needed  to  solve  a system 
was  relatively  long  for  each  test.  What  is  not  included  in 
Table  VXI,  but  listed  on  the  computer  printouts,  was  that 
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TAbl«  VII 

Pin*l  Pcxfonuinctt  Comparisons 


Test 

Matrix 

YSMP 

^ fMULT 

MPPTH 

#13 

Error t 

10’*^ 

lO'*’^^ 

10’*^ 

Timet 

0.12 

0.35 

1.41 

#14 

Errors 

10-* 

10-‘« 

Timet 

0.16 

0.34 

2.13 

#15 

Errors 

10-* 

10*“ 

W 

1 

o 

Timet 

0.21 

0.40 

3.04 

#16 

Errors 

10"^ 

10® 

io-‘< 

Timet 

0.23 

0.76 

4.06 

#17 

Errors 

10’^ 

10’^ 

10’*^ 

Timet 

0.27 

1.05 

4.96 

#18 

Errors 

10’2 

10® 

1 

o 

TiSMS 

0.28 

0.78 

6.09 

#19 

Errors 

10-* 

10° 

10-1“ 

Timet 

0.30 

1.11 

7.03 

#20 

Errors 

10-* 

10“^ 

w 

1 

o 

Timet 

0.30 

1.18 

8.23 

Notsst  YSMP  - LU  Docosiposition,  a priori  strategy. 
SIMJLT  - Gauss-Jordan  Reduction  with  Minimum 

Row/Minimum  Column  pivoting. 

MFPXH  • Gaussian  Elimination  with  Consecu- 
tively Calculated  pivoting  (pivot 
tolerance  ■ 10). 

Error  suignitudes  calculated  as  in  Bq  (29). 

Time  in  seconds. 

ZIBST  for  SIMULT  > 10’*®. 


for  each  of  the  test  matrices  t the  MPPTH  pivoting  strategy 
chose  consecutive  elements  on  the  diagonals  for  the  best 
accuracy.  This  observation  suggested  the  next  configuration 
of  the  HPP  program. 
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Extra  Features  for  MFP 


Motivated  by  the  diagonal  pivot  selections  in  the  pre- 
ceding tests  and  a to  increase  the  speed  of  the  MFPTH 

prograiBt  this  stu'^ent  p egramned  an  additional  strategy. 

The  new  scheme  was  an  a priori  pivoting  strategy  which  chose 
the  pivot  coordinates  from  the  first  non-seros  in  consecutive 
rowst  unless  a sero  appeared  on  the  diagonal*  then  the  en- 
tire diagonal  was  the  source  for  pivot  values.  As  an  aid  to 
the  user*  the  a priori  pivot  subroutine  (called  "APRIORl”) 
was  included  into  the  MPP  program  structure  with  the  THINKER 
subroutine;  as  a result,  the  user  was  given  an  option  for 
which  strategy  he  desired.  If  a diagonally  dominant  system 
were  being  solved,  choice  of  APRIORI  would  yield  good  accu- 
racy with  a relatively  quicker  solution  time;  if  the  APRICMII 

•2 

subroutine  failed  to  give  accuracy  better  than  R • 10 
(Bq  [29]),  the  main  program  would  automatically  reset  and 
begin  again  with  THINKER.  Of  course,  THINKER  could  have 
been  chosen  from  the  beginning. 

§pii^  HfP 

The  solutions  for  each  using  the  a priori  strategy 
gave  precisely  the  same  accuracy  as  in  Table  VII  but  with  a 
measurable  tisM  improvement  (Table  VIII). 

The  designation  for  this  configuration  of  MPP  is  "MPPDP” 
to  indicate  the  "option*  feature.  (A  listing  of  MPPOP  can 
be  obtained  froe  the  APIT  Computer  Archive. ) The  additional 
core  space  required  for  APRIORI  was  only  370  (octal)  loca- 
tional in  terms  of  the  entire  program  load  sise,  the  increase 
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Table 

Time  Comparisons  of 

VIII 

APRIORl 

1 

and  THINKER 

Test 

Timet 

Matrix 

APRIORl 

THINKER 

^Reduction 

#13 

1.18 

1.41 

16S 

#14 

2.11 

2.13 

i 

IS 

#15 

3.02 

3.04 

IS 

1 

#16 

3.99 

4.06 

2S 

#17 

4.90 

4.96 

IS 

#18 

5.83 

6.09 

4S 

#19 

6.78 

7.03 

4S 

1 

#20 

7.87 

8.23 

4S 

1 

is  n«gligibl«  bscauss  nany  of  ths  sans  7CMRTRAN  systsa  rou- 
tinos  used  by  APRIORZ  were  already  present  for  THINKER. 


<?9nclvttjl9J?g 

A user  uight  be  Motivated  to  use  the  SIMULT  program  for 
dispersed,  unstructured  Mtrices  or  the  YSMP  program  for 
tightly-banded  matrices}  this  motivation  comes  as  a result 
of  time  considerations.  However,  the  HPPOF  program  deoion- 
strated  the  capability  to  solve  a very  wide  variety  of 
matrices  with  consistently  better  accuracy  than  either 
SIMULT  or  YSMP.  Also,  the  user's  flexibility  in  controlling 
the  progress  of  the  solution  with  MFPQP  is  a very  important 
consideration . 

It  must  be  emphasised  that  to  attain  the  high  degree  of 
accuraesr,  the  MPPOP  program  had  to  allow  larger  amounts  of 


i 
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in  both  the  a priori  and  local  strategies  than  most 
classical  sparse  matrix  studies  conclude  would  be  tolerable 
ior  a very  lay.ge  problem  (n  » .10,000»  for  example).  Th« 
average  speed  improvement  due  to  the  introduction  of  the 
a priori  pivoter  was  only  but  for  a class  of  problems 
that  are  tightly  banded,  such  as  a pentadiagonal , the  net 
iiq>rovement  was  a high  lOH,  With  these  factors  of  fill-in 
and  time  in  icind,  the  next  segment  of  this  thesis  addressed 
the  problems  of  a new  data-packing  scheme  and  an  even  fur- 
ther iiqprovement  in  the  time  factors. 
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VI • SDAr«c  Packing  in  the  CPC  6600  Coroputgr 

Th«  choic«  of  th«  MPPOP  Sparse  Solver  as  the  optimum 
Gaussian  Elimination  algorithm  caoae  not  only  because  of  the 
higher  theoretical  accuracy  which  it  provides  but  also  be- 
cause it  was  highly  suitable  for  the  CDC  6600  Computer. 

While  all  of  the  tested  programs  were  written  in  the  FORTRAN 
computer  language  (which  is  standard  for  most  large  comput- 
ers), their  accuracies  wsre  enhanced  to  a great  degree  by 
the  numeric  superiority  which  the  CDC  6600  has  over  ammy 
cos^juters.  This  chapter  deals  more  closely  with  such  com- 
puter capabilities  as  they  pertain  to  a new  packing  scheme 
developed  specifically  for  MFPOPi  a consequence  of  this 
packing  scheme  is  that  it  justifies  the  allowance  for  fill- 
in  idiich,  in  many  other  cosiputers,  would  be  intolerable. 

To  help  in  the  understanding  of  the  new  packing  scheme, 
a review  of  the  basic  hFP  packing  method  is  presented,  fol- 
lowed by  the  description  of  the  implementation  of  the  new 
packer.  The  final  configuration  of  the  Gaussian  Biisdnation 
algorithm  is  also  described  since  it  is  a streamlined  ver- 

• 

sion  of  the  modular  MIT  concept.  A summary  presen vS  the 
overall  benefits  of  this  student's  program  as  it  has  been 
run  on  the  CDC  6600  Cos^Mtwr. 

P.ightofl.  §stnia> 

In  order  to  pack  only  the  non-seros  of  the  sparse  ^ 
aatrix,  the  MPP  program  needed  several  FORTRAN  arrays.  (The 
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r*«d«r  is  dirsctsd  to  Appendix  B for  the  complete  packing 
method  and  the  flow  chart  for  the  algorithm. ) The  arrays 
were  as  follows t 

lA  - The  array  (sise  N)  sdiich  contains  the 
locations  of  the  starting  points  of 
the  rows. 

JA  - The  array  which  holds  the  column  coor- 
dinates for  each  non-sero  in  the  ^ 
matrix.  (Sise  h the  number  of  non-seros, 
denoted  "NNZR.") 

A - The  array  which  contains  the  non-seros 
(sise  NNZR). 

ISTAT  •'  The  array  (sise  N)  which  contains  the 
number  of  non-seros  in  each  row. 

As  part  of  the  forward  Gaussian  algorithm » the  arrays  JSTAT 

(a  column  status  vector),  JCQL  (a  working  vector),  IPIV  (row 

ordering  array),  and  JPIV  (column  ordering  array)  were  also 

required.  Thus  the  minimum  data  space  required  for  the 

sparse  matrix  and  its  solution  was  the  following  set  of 

arrays t 


6 integer  arrays,  sise  N 
1 integer  array,  sise  WiZR 
1 floating-point  array,  sise  NNZR. 

The  sise  of  NNZR  for  any  routine  must  be  judiciously  chosen; 

a certain  allowance  for  fill-in  is  required.  As  a rule  of 

thumb,  the  following  formula  f c ' NNZR  was  used  in  developing 

the  final  strategy; 


NNZR  < (5fl)  X (N^)  x 2 (39) 

Tb  implement  MPPOP  on  any  computer,  the  information 
from  each  of  the  arrays  is  necessaryi  but  one  should  note 
that  if  the  largest  nuOber  of  equations  to  be  solved  is 
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liait«d,  th«  largast  value  stored  in  any  of  the  elements  of 
the  integer  arrays  will  be  very  small  relative  to  the  largest 
calculable  integer  for  that  computer.  Since  the  CDC  66CX) 
compter  had  a word  sise  of  60  bits  (which  is  nearly  twice 
aa  large  as  the  single  precision  word  on  most  other  comput- 
eiTs),  for  an  N on  the  order  of  1000 » only  the  right  most  ten 
bits  of  each  word  would  be  used;  the  remaining  50  bits  would 
be  wasted.  It  is  the  crux  of  the  new  packing  scheme , there- 
fore* to  use  as  much  of  the  integer  word  as  possible  to 
store  information. 

New  Packinfl.  .agftaHt 

The  new  packing  schenie  involves  manipulation  of  the 
bits  of  the  arrays  for  both  the  six  N-sized  arrays  arid  the 
two  NNZR-sised  arrays. 

N-sised  Arrays.  One  can  envision  using  the  extra  50 
bits  of  a computer  word  in  the  CDC  6600  as  room  to  store 
other  arrays!  that  is*  by  subdividing  or  "segmenting**  the 
bit  structure  of  the  wordu  in  only  one  array*  the  informa- 
tion of  many  arrays  can  be  coif>acted.  To  successfully  iii- 
plement  this  idea*  the  programser  mast  keep  in  mirxl  that 
Octal  arithmetic  is  used  in  the  COC  dtXX)  and  the  storage  and 
retrieval  of  information  from  a segmented  word  must  be  han- 
dled carefully. 

In  the  modified  MFPOP  program*  tlie  six  N-sised  arrayo 
are  packed  in  grotqas  of  threei  JSIAX*  XStAX*  and  IA|  and 
JOOLf  ZPXV*  and  JPXV.  The  array  variable  name  is  called 
"XMCIO*  to  indicate  the  integer  data  structure.  The  first 
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N v*lu«s  of  INTBG  contain  JSTAT,  ISTAT,  ajid  lA  in  throe 
groups  of  20  bits.  The  value  of  INTEG(N4>1)  contains  a spe- 
cial value  used  by  the  packing  subroutine,  and  the  value  of 
INTEG(N4'2)  holds  the  initial  NNZR  value.  The  next  N loca- 
tions in  INTEG  contain  JCOL,  IPIV,  and  JPIV  in  three  groups 
of  20  bits. 

Since  the  value  of  lA  is  usually  a large  number,  the 
position  of  lA  in  the  INTEG  word  is  very  important!  since  it 
occupies  the  right-isost  20  bits,  then  the  value  of  lA  can  be 
stored  as  if  it  were  a decimal  number.  But  such  is  not  the 
case  for  JSTAT  and  ISTAT.  However,  these  two  status  vectors 
are  built,  incremented,  and  decromented  only  one  unit  at  a 
timei  to  add  or  subtract  a "1"  in  the  segment  for  ISTAT,  a 
specific  octal  number  (4000000g)  is  added  to  the  entire 
IN11SG  word.  To  continue  the  exan^le,  where  the  old  state- 
ment was  programoMd  as  ''ISTAT(I)«ISTAT(I)  4-  1,"  the  new 
statement  reads  "1N1EG(I)«INTEG(1)  ♦ 4000000B."  (The  "B"  is 
the  PORTRAN  definition  of  an  OCTAL  constant. ) A similar 
octal  number  increments  JSTAT. 

To  extract  tlie  specific  data  for  a given  row  K,  the 
value  of  INIBG(K)  is  first  placed  into  a working  register. 
Then,  by  use  of  two  supplied  CDC  functions,  the  proper  in- 
formation can  be  unambiguously  retrieved!  to  extract  IA(K), 
the  value  of  INTBG(K)  is  masked  with  an  ".AND."  function 
over  the  ripht-most  20  bitst  to  extract  ISTAT(K),  the  value 
of  INXBG(K)  is  SHXPT-ed  20  bits  to  the  right  and  then  masked 
with  the  .AMD.  function  (Ref  3t2-12|  8-4).  Por  tlte  arrays 
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JCOL(  IPIV,  and  JPIV^  a similar  extraction  method  is  used; 

C*  however,  to  store  the  values • an  intermediate  register  must 

COM  into  play.  The  register  is  set  to  zero;  the  computed 
value  of  JCCH^  or  IPIV  is  inserted  and  left-shifted  the  appro- 
priate number  of  bits;  then  the  contents  of  the  register  is 
added  to  the  appropriate  INTEG  element.  (^>pendix  G is  the 
listing  of  the  program  with  the  bit-sliced  packing. ) 

The  savings  on  the  N-sized  arrays  is  very  important  for 
large  linear  systems;  if  NvlOOO,  then  the  p::eviously  lAsed 
6000  locations  for  the  arrays  is  reduced  to  only  2000.  Thus 
the  allowance  of  4000  extra  data  locations  is  made  available 
for  fill-^n.  In  practici^,  only  19  of  the  20  bits  per  array 
are  used;  thi^  restriction  is  n(>ce&sary  because  the  left- 
sost  bite  of  JSTAT  and  JCCX  are  sign  bits  of  the  CDC  6600 
computer  words.  Manipulation  of  the  sign  bit  may  create 
problems  for  the  data  stored  in  the  entire  word.  With  this 
configuration,  thr  maximum  number  of  equations  is  initially 
Iltdted  tc  (2^^-l)  or  524,287;  as  large  as  this  number  is, 
the  NNZR-sized  arrays  place  a much  more  stringent  restric- 
tion on  the  maxi.fv:m  murber  of  eq^iations^ 

HWfflR-iigfll.  Aam.*  i^r^her  reduction  in  data  storage  . 
reqiuirenents  is  4klso  possible  by  combining  the  two  NNZR- 
sised  arrays,  A and  JA,  sines  every  A value  has  A corres- 
ponding JA  table  entry.  The  procedure  is  similar  to  the 
technique  used  in  tlie  M-eised  arrays,  except  that  floating- 
point numbers  and  integer  mtnbers  are  mi^cad.  In  the  CDC 

0 6<600  CoiqMiter,  a floating-pcint  number  is  stored  with  the 
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left-most  12  bits  as  the  sign  and  the  exponent,  and  the  re- 
maining 48  bits  as  the  mantissa.  In  the  new  packing  scheme, 
the  right-most  ten  bits  of  the  mantissa  are  masked  to  zero, 
and  then  the  integer  value  of  JA  is  inserted  by  using  a 
logical  function.  The  array  which  takes  the  place 

of  A and  JA  is  called  REALS. 

The  first  N locations  of  REALS  contain  the  b vector 
elements,  and  the  next  NN2S^  contain  the  compacted  A and  JA 
data.  To  retrieve  a value  for  A,  the  value  of  REALS  is 
fetched,  and  then  the  last  ten  bits  are  masked  off;  a real, 
floating-point  nimber  is  the  result.  To  retrieve  a value 
of  JA,  the  value  of  REALS  has  the  left-most  50  bits  masked, 
and  an  integer  value  is  the  result. 

There  is  one  very  important  advantage  to  this  packing- 
features  the  storage  required  for  each  non-zero  aixi  each  new 
fill-in  is  t \lf  of  what  the  old  scheme  required.  There  are, 
however,  three  noteworthy  disadvantages!  1)  with  only  ten 
bits  allowed  for  JA,  then  the  maximum  number  of  equations 
vrtsich  car«  be  solved  is  further  restricted  to  (2^^-l)  or  1023; 
2)  masking  off  ten  bits  from  the  floating-point  number 
decreases  the  allowable  accuracy  vo  a maximum  of  only  11 
decimal  places  instead  of  14;  3)  the  real  value  stored  in 
the  mantissa  is  no  longer  rounded  but  truncated  to  38  bits. 
While  the  maximum  number  of  equations  can  be  increased  by 
changing  the  masks  for  A and  JA  (to  11,  12,  or  13  bits)  the 
maximum  accuracy  is  accordingly  decreased. 

The  sacrifice  for  accuracy,  however,  is  not  costly. 


Th«  MFt  pivoting  strategy  can  b«  very  sensitive  to  stability, 
and  thu4!^  f.t  conpe'isates  for  Ihese  induced  aachine  inaccura- 
cies. Table  IX  shows  the  performance  changes  manifested  by 
the  new  packing  strategy.  (The  new  variation  is  called 
**GBBIT'*  to  dekKit^  "Gaussian  Bliminaxion  with  Bit-slicing.") 


Table  IX 

Comparison  of  MFPOP  with  CEBIT 
on  Some  Test  Matrices 


Test 

Matrix 

Errors 

MFPOP 

CEBIT 

# 1 

10“^^ 

io"ii 

# 3 

io-« 

10-« 

# 4 

lO’^^ 

1 

o 

# 6 

10*^^ 

1 

o 

#10 

10“® 

10-10 

#15 

10-1“ 

1 

o 

#18 

10-1“ 

lo'ii 

#20 

10-1“ 

io“ii 

Streamlined  Aloorithm 

After  the  final  tenting  of  CEBIT  configuration,  an  at- 
tempt was  made  to  further  improve  the  speed  of  the  aJ.gorithm. 
As  poicthd  out  in  ChS4)ter  III,  the  many  calls  to  subroutines 
by  the  Gaussian  Forward  pivoters  did  use  up  mucki  timei  while 
this  use  of  modular  subroutines  was  a tresMndous  asset  in 
the  testing  phase  of  this  thssis,  the  final  "production* 
model  would  have  been  unnecessarily  slow.  For  the  final 
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configuration f thoroforct  th«  subroutinos  were  removed,  and 
their  logic  etructuree  were  programmed  within  the  Gaussian 
subroutines  THINKBK,  APRICWI,  and  GAUSSBX.  This  program 
modification  also  saved  extra  time  in  that  only  the  logic 
necessary  at  a particular  step  in  the  algorithm  was  used 
and  not  the  all>encoaq>assing  logic  of  the  subroutines  DBLBTB, 
STOR£,  and  FETCH  (/^>pendix  B).  The  numerical  accuracy  of 
this  final  configuration  is  naturally  the  same  for  CEBIT} 
but  Table  X shows  the  improvement  in  the  time  usage  over 
CEBIT  and  HFPOP.  (The  final  configuration  is  called  "SMART** 
to  denote  the  **Sparse  Matrix  Algorithm  Research  Thesis.") 

Table  X 

Time  Isqprovemen «.  of  the  Streamlined  Algorithm 


Test 

Matrix 

MFPOP 

Time  ( Seconds )t 
CEBIT 

SMART 

#13 

1.18 

1.04 

0.75 

#14 

2.11 

1.77 

1.39 

#15 

3.02 

2.50 

2.02 

#16 

3.99 

3.25 

2.67 

#17 

4.91 

4.03 

3.10 

#16 

5.83 

4.75 

4.01 

#19 

6.78 

5.57 

4.69 

#30 

7.87 

6.50 

5.41 

Interestingly  enough,  the  CEBIT  configuration  showed 
an  average  improvement  of  nearly  17%  by  itself.  One  reason 
for  this  increase  is  that  the  time  for  the  fast  register 


functions  (SKIPTj  .AND.,  snd  .OR.)  sre  much  less  than  the 
fetch  cosntands  from  the  relatively  slow  core  roemory.  Thus 
the  computer  needs  only  to  fetch  one  number.  REALS(M},  to 
attain  both  the  values  for  JA(M)  and  A(K).  Overall,  the 
SMART  configuration  showed  an  improvement  of  nearly  33%  in 
time. 

1*hile  the  SMART  routine  was  still  slower  than  Sherman's 
YSMP  or  Key's  SXMULT,  the  accuracy  demonstrated  that  this 
program  can  be  a competitive  sparse  matrix  solver  for  very 
large  programs.  A scaled  version  of  SMART  was  run  on  the 
APIT  CDC  6600  computer  with  a 1000~by >1000  pentadiagonal 
matrix.  The  error  was  on  the  order  of  10°^^}  but  the  most 
obvious  result  was  the  total  core  usage  required i only  64K 
words.  With  the  packing  schemes  of  YSMP  and  SIMULT,  to  run 
the  same  problem  would  have  required  considerably  more  core 
storage  (well  over  lOOK  words).  For  problems  not  quite  as 
large  as  NkIOOG,  the  SMART  program  could  be  used  on  the 
XNISRCXJK  terminals  st  AFIT  and  AFWL  where  the  core  limita- 
tion is  set  at  60K.  Clearly,  the  Gaussian  Elimination  pro- 
gzammed  for  this  thesis  is  an  adequate  computational  tool. 


VII.  Conclusions  and  Reconunendations 


Thtt  suBsnary  of  this  thesis  includes  a discussion  of  the 
attainsient  of  the  thesis  objectives » a discussion  of  the 
outcoioe  of  the  algorithm  programmed  by  this  student,  and  a 
list  of  subject  areas  for  further  study. 

The  comparison  of  the  two  existing  sparse  matrix 
solvers  (the  LU  Decomposition  in  YSMP  and  the  Gauss-Jordan 
Reduction  in  SIMULT)  pointed  out  two  clearly  definable  areas 
idicre  the  accuracy  of  one  program  was  ouch  better  than  that 
of  the  other}  YSMP  worked  well  for  tightly>banded  matrices 
while  SlIfJLT  favored  the  more  unstructured  systcitms.  Dnfor> 
tunately,  the  areas  in  which  the  programs  worked  their  best 
did  not  overli^;)}  furthermore,  the  matrix  structure  of  the 
example  in  C!u^>ter  l--a  very  coumon  structure- -was  not  con- 
tained in  either  region. 

The  correlative  program  of  the  Gaussian  Blimination 
with  strategic  pivoting  (called  SMART)  not  only  filled  the 
void  but  included  areas  of  performance  common  to  both  YSMP 
and  SIMULT.  The  pivot  echMes  available  with  SMART  (Consec- 
utively Calculated  or  an  a priori  pivot)  yield  a potentially 
serious  consequences  the  possibility  exists  that  fill-in  may 
occur  more  frequently  than  in  the  programs  of  other  methods. 
In  fact,  these  pivot  strategies  conflict  somewhat  with  the 
O thoughts  of  the  writers  of  many  articles  in  Sparse  Matrix 
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literature  (i.e.*  there  is  a great  emphasis  on  fill-in  mini- 
misation in  many  of  these  papers).  This  apparent  conflict 
was  resolved  in  two  ways  which  justify  the  use  of  these 
pivot  schemes!  1)  by  an  examination  of  the  word  structure 
in  the  computer  at  hand  and  2)  by  the  construction  of  the 
new  packing  scheme  for  this  thesis. 

Computer  Word  Structure.  The  preponderance  of  comput- 
ers used  for  sparse  matrix  study  in  the  past  decade  have 
only  32-bit,  single-precision  words t to  attain  good  accuracy 
for  large  sparse  systems,  the  use  of  double-precision  FX>RTRAN 
is  a necessity.  Therefore,  the  total  number  of  distinct 
storage  locations  is  cut  in  half.  Thus  the  concern  for  fill- 
in  growth  in  these  kinds  of  comF..ters  is  quite  valid.  On 
the  other  hand,  the  CDC  6600  provides  a 6C-bit,  single-pre- 
cision word  which  allows  essentially  the  same  accuracy  as 
most  32-bit-per-word  computers  at  double-precision.  As  a 
result,  much  sKire  extra  space  is  available  if  fill-in  grows 
to  large  proportions. 

Packing  Scheme.  Additionally,  the  bit-sliced  packing 
scheme  used  in  SMART  further  reduces  the  requiresMnts  for 
data  storage  by  combining  information  from  several  data 
arrays  into  a single  data  array.  In  handling  the  coaq>acted 
A matrix  alone,  the  SMART  algorithm  uses  nearly  50fl  less 
storage  space  by  combining  the  ^ values  and  their  respective 
column  index  pointers  into  divisions  of  the  same  arrays. 

(this  new  packing  scheme  thus  fulfils  the  second  major  objec- 
tive of  this  thesiss  the  efficient  use  of  computer  core 


storage. ) 


( Thus  any  conflict  with  the  proponents  of  fill-in  reduc- 

tion is  avoided  because  the  SMART  program  as  it  is  run  on 
the  CDC  6600  computer  can  clearly  be  allowed  to  pivot  for 
accuracy  or  calculation  reduction  rather  than  strictly  for 
fill-in  minimization i:  the  extra  core  space  is  readily  avail- 
able. 

Critique  of  the  Gaussian  Elimination  Program 

There  are  several  factors  to  discuss  about  the  Gauss- 
ian Elimination  algorithm  as  it  is  programmed  in  SMARTi  the 
factors  discussed  deal  'dth  both  tangible  and  intangible 
considerations . 

Tangible  Advantages.  The  accuracy  of  SMART  is  well- 
( documented  over  a wide  scope  of  linear  systems.  Even  for 

very  large  systems,  the  core  usage  for  SMART  is  small  when 
cosfxared  to  the  two  existing  sparse  solvers  tested.  Also, 
inasmuch  as  the  user  has  an  option  on  the  pivot  strategy 
(consecutively  calculated  or  a priori)  and  a choice  for 
pivot  tolerance  (for  accuracy  or  some  fill-in  reduction), 
the  SMART  program  can  satisfy  a large  range  of  required 
capabilities  very  easily. 

Tangible  Plsadvantaoe.  The  single  drawback  of  SMART  is 
the  tine  it  takes  to  solve  a problesi.  The  reason  for  this 
tine  excess  is  related  to  the  packing  and  core  usage  designs! 
to  keep  core  usage  low,  much  of  the  array  of  data  must  be 
^ relocated  at  each  incidence  of  a fill-in  or  deletion.  For 

example,  if  a fill-in  were  to  occur  nesu:  the  top  of  the  data 
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array,  thi<>  subsequent  data  are  all  shifted  down  one  location 
to  aake  room  for  the  fill-in.  The  array  is  similarly 
shifted  up  one  location  for  each  deletion.  In  the  context 
of  a large  linear  system  whose  solution  may  require  aome 
fill-in,  these  data  manipulations  for  the  sake  of  core  stor- 
age translate  into  much  extra  computer  execution  time. 

Intangible  Advantages.  Even  with  long  execution  times, 
programs  which  require  less  core  space  in  a computer  are 
often  given  higher  priority  for  execution!  as  a result,  the 
outnut  from  the  SMART  program  would  be  finished  and  into  the 
hands  of  the  user  much  sooner  than  for  YSMP  or  SIMULT  for 
the  same  large,  sparse  matrix.  This  "turn  around  time"  can 
be  a very  important  element  in  a user's  computational  needs. 
(The  proper  analysis  of  this  concept  lies  in  understanding 
the  operating  system  of  the  particular  ccsiputer  in  use.)  In 
any  case,  in  a large  multiprogramiied  computer  environment 
(such  as  AFIT  or  APML) , it  is  generally  harder  to  get  large 
amounts  of  core  at  a given  time  than  it  is  to  get  extended 
execution  time.  Another  intangible  benefit  of  SMART  is  that 
Gaussian  Elimination  is  very  easily  studied | thus  the  algo- 
rithm makes  SMART  highly  editable  to  other  thesis  study. 

Suggested  Areas  for  Further  Study 

Zn  the  entire  field  of  Sparse  Matrix  research,  there 
are  other  considerations  whieti  have  been  applied  to  other 
sparse  matrix  solution  techniques.  As  a way  to  mention  some 
of  these  factors  and  how  they  might  apply  to  the  SMART  algo- 
rithm, the  following  rscomneniJations  sure  presented! 


Reduction  of  Execution  Time,  l)  It  would  be  interest- 
ing to  attempt  a Consecutively  Calculated  Pivoting  strategy 
in  the  faster  Gauss-Jordan  Reduction  and  LU  Decomposition 
programs  of  Key  and  Sherman.  2)  The  packing  scheme  of  SMART 
could  be  modified  so  that  the  tiBe>^consuming  ''shuffle"  dur- 
ing deletions  and  fill-in  could  be  eliminatedt  by  use  of  a 
"linked  list"  table,  elements  of  a row  need  not  be  stored 
adjacent  to  each  other;  the  table  list  would  contain  the 
coiqputer  "addresses"  of  consecutive  row  elements.  Bit-slic- 
ing could  be  applied  here  so  that  the  linked  list  table 
would  not  require  Inordinate  amounts  of  extra  working  space. 

Iterative  Techniques.  1)  While  this  thesis  dealt  only 
with  direct  solutions  of  sparse  itatrices,  it  would  be  appro- 
priate to  conduct  a thorough  investigation  of  iterative 
techniques  and  compare  their  results  with  SMART.  2)  Many 
srriters  suggest  using  iterative  improvers  for  systems  which 
are  solved  with  poor  accuracy!  the  direct  solutions  can  be 
used  as  the  starting  points  for  the  iterations.  Along  these 
lines,  consideration  should  be  given  to  the  case  where  a 
particular  solution  to  Bq  (1)  is  found  the  first  time  using 
SMART;  then,  some  elesnnts  of  ^ or  ^ might  be  changed  to  re- 
flect subtle  differences  in  the  suxthematical  model.  An  iter- 
ative solver  could  use  the  first  accurate  solution  from 
SMART  as  a starting  point  for  tlie  solution  of  the  modified 
ss^stem.  Such  a technique  sdll  provide  much  faster  solutiors 
than  complete  recomputatioo  top  either  SMART  or  the  iterative 
solver  starting  from  scratch. 


Ricx>roua  MAthcinatical  Tests,  l)  For  a class  of  near- 
singular matrix  systems,  it  may  be  necessary  to  interface  a 
direct  solution  with  an  iterative  solution  as  described 
above I the  useful  result  would  be  an  idea  of  the  true  extent 
for  which  the  Gaussian  sparse  solutions  are  applicable. 

2)  In  order  to  handle  problems  which  transcend  the  well -con- 
ditioned systems  common  to  physics  and  engineering,  the  pro- 
gramming of  a scaling  algorithm  (as  in  !iq  [3^])  would  allow 
the  more  theoretical  systems  to  be  tested.  3)  Additional 
study  on  algorithms  for  special  matrix  structures  (such  as 
symmetic  or  symmetric  zero  structure)  is  a naturally  follow- 
on  to  general  matrix  oolutiono. 

Frooram  Adaptability,  l)  Since  sparse  matrices  can 
com  as  a result  of  finite  differencing  techniques  for  solv- 

f 

ing  differential  equations,  it  stay  be  appropriate  to  con- 
struct cos^Mter  programs  which  can  generate  the  sparse  ma- 
trices given  the  differential  equations  and  the  boundary 
conditions.  An  ii^wrtant  suggestion  in  this  case  is  to 
standardise  the  data  formats  so  that  the  sparse  generator 
programs  can  interface  directly  with  the  sparse  solver. 

2)  To  carry  the  analogy  one  step  further,  one  can  conceive 
of  one  larc.ie  cosipater  program  which  generates  the  sparse 
matrix,  solves  it  with  direct  methods,  and  improves  the 
answer  witli  iterative  techniques.  With  such  a large  concept, 
exploitation  of  the  computer's  operation  system  would  be  a 
vsLluable  aid  to  this  end. 

( 
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would  b«  of  groat  value  to  attempt  to  execute  SMART  both  in 
aingle-precision  and  double '•precisioi'i  on  a computer  other 
than  the  CDC  6600;  the  performance  degradation  by  using  a 
32«bit -per -word  computer  would  be  of  particular  interests 
It  would  be  necessary  to  make  changes  to  the  bit-sliced 
array  packing  depending  on  the  sophistication  of  a particu- 
lar FORTRAN  compiler  with  such  computers. 


In  summary,  the  most  io^rtant  result  of  this  thesis  is 
the  development  of  the  spairse  ouitrix  solver  SMART.  The  pro- 
gram gives  the  user  much  flexibility  in  the  conduct  of  a 
particular  problem's  solution.  The  algorithm  of  SMART  pro- 
griimml  on  the  CDC  6600  computer  can  provide  accurate  solu- 
tions from  a very  cospact,  efficiently  used  core  structure 
for  a wide  rarige  of  linear  system  structures. 
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impend ix  A 

Example  of  the  Need  for  Strategic  Pivotino 
Givent  0.0001  Xj^  + 1.00  X2  = 1^00 

.1.00  Xj^  + 1.00  X2  = 2,00 

Desk  calculator  solution:  Xj^  s 1.00010001 

Xg  « 0.99989998 

"Machine”  limitation:  Accuracy  is  limited 

to  three  places. 

Case  I:  Gaussian  Elimination  without  Pivoting 
Operation  1 - 1.00  + 10,000  X2  •■=  10,000 

0.0  x^  - 9,999  X2  = -9,998 

Operation  2 - 1,00  x^  ♦ 10,000  X2  » lOjOOO 

0.0  Xj^  + 1.00  X2  * "1.00"  (round-off) 

Solution:  s 0.00  and  X2  « 1.00 

Case  II:  Elimination  with  Row  Interchange 
X . 00  x^  ♦ 1 . 00  ^2  * ^ • 00 

0,0001  X]^  ♦ 1.00  X2  « 1.00 

Operation  1 * l.OO  x^  1.00  X2  ■ 2.00 

0.0  x,  + "1.00"  X2  * 1.00 

(round-off) 

Solution:  Xj^  ■ 1.00  and  X2  » 1*00  ("Perfect"  to  within 

round-off  accuracy 
of  desk  calculator.) 

(Prom  Ref  7:34) 
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APPENDIX  B 


Flow  Charts  for  MFP 
Subroutines 


Appendix  B 

Flow  Charts  for  MFP  Subroutines 


General  Information 

This  app>endix  contains  the  logical  flow  charts  for  the 
important  subroutines  in  the  MFP  sparse  inatrix  solver:  PACKl, 
FETCH,  STORE,  DELETE,  GSSGEN,  and  GAUSSBX.  The  program 
GSSGEN  is  the  forward  Gaussian  Elimination  with  a general 
pivot  strategy  which  gave  rise  to  the  v£uriations  of  MFP. 

(A  FORTRAN  listing  of  the  entire  MFP  program  can  be  obtained 
from  the  AFIT  Computer  Archive.) 

PACKl 

PACKl  compresses  the  A matrix  into  a compact  form, 

reads  in  the  b vector,  and  generates  status  information. 

The  FORTRAN  arrays  used  are  the  following: 

lA  - The  starting  address  of  the  i-th  row, 

JA  - The  column  coordinates  for  the  A values. 

ISTAT  - The  number  of  non- zeros  in  the  i-th  row, 

A - The  column  array  of  the  A matrix  in 
compressed  form. 

B - The  constant  vector. 

Fig,  B-1  shows  the  packing  of  only  the  non-zeros  in  a sample 
4-by-4  system.  The  flow  chart  (Fig.  B-2)  depicts  only  the 

e 

essential  logic  for  packing  and  not  the  error  checks  which 
are  contained  in  the  program. 

The  main  program  and  all  of  the  functional  subroutines 
use  the  arrays  generated  in  PACKl.  This  . ')nnation  is  ex- 
changed between  programs  by  way  of  a COMMON  statement. 
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Fig,  B-1,  Packing  Scheme  for  MFP 
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INEW  > The  row  number  of  the  next  set  of  data. 

J1£PT  - The  leftmost  coordinate  of  first  piece  of  data. 
JL2NG  - How  many  non->zeros  will  be  read. 

KSTRT  - The  staring  address  of  the  next  row. 

I > The  index  of  the  current  row  being  packed. 
IA(N4-1)  • The  location  aifter  the  last  non-zero. 


Pig.  8-2.  Flow  Chart  of  PACKl 


FETCH 


To  attaiin  the  value  of  an  ^ call  to  FETCH  will  re- 

turn with  the  value  or  zero  if  a value  is  not  found.  Since  a 
con^>arison  with  a floating  point  zero  in  FORTRAN  is  not  al ' 
ways  valid ^ a logic  flag  called  ZERO  is  set  to  TRUE  if  no 
value  is  found}  this  actually  saves  time  since  a logic  check 
is  faster  than  am  arithmetic  compaorison.  The  call  to  subrou- 
tine requires  the  coordinates  I and  J;  the  value,  the  condi- 
tion of  the  logic  flag,  and  a vailue  called  IHOLD  are  returned 
In  subsequent  subroutines,  IHOLD  is  used  because  it  contains 
the  present  address  of  a^^j;  if  a^j  must  subsequently  be  de- 
leted or  a iiew  value  stored  into  a^j,  IHOLD  tells  immediately 
vdiere  that  value  must  go.  Thus,  the  program  is  spared  the 
extra  file  search  for  the  location  of  aj^j  for  these  other 


subroutines 


DELETE 


The  subroutine  DELETE  removes  from  the  compacted  form 
of  the  A matrix  any  value  which  will  be  eliminatea  in  the 
Gaussian  forward  process  or  any  pivot  element  which  has  been 
normalized  and  is  understood  to  be  exactly  1*0  in  value. 

Any  element  a^^j  can  be  eliminated  by  DELETE;  however, 
it  is  usually  the  case  that  the  value  IHOLD  contains  a nvim- 
ber  which  is  the  address  of  the  a^^j  to  be  eliminated.  Thus 
another  search  of  the  row  is  not  necessary.  There  is  a 
safety  check t the  coordinate  J is  checked  with  the  value  of 
JA( IHOLD)  to  be  sure  that  the  correct  element  is  to  be  elimi- 
nated. If  this  test  fails,  the  subroutine  merely  reverts  to 
a row  search.  The  following  variables  are  defined  for  use 
in  the  flow  chart  (Fig.  B-4)t 

NDEL  - The  address  of  the  value  to 
be  deleted. 

II£L  - A counter  to  track  the  number 
of  deletions  which  the  forward 
Gauss  subroutine  must  make. 

. Once  NCGL  is  computed,  the  airrays  JA  and  A aire  all  shuf- 
fled up  one  location  starting  at  JA(NDEL)  and  A(NDEL) . Then 
the  starting  addresses  of  rows  (i'fl)  through  (ntl)  are  dec- 
remented one  place. 

The  counter  IISL  can  be  used  to  check  the  relative  per- 
foi'icance  of  the  various  pivoting  strategies.  Once  the  final 
form  of  HFP  has  been  establishes^ » the  counting  of  IDBL  is  no 
longer  needed  for  the  user's  information. 


SigSB 


The  subroutine  STCMIE  places  a computed  value  back  into 
the  ^ matrix  or,  if  the  situation  dictates,  it  will  create 
space  for  a fill-in  value. 

As  with  the  subroutine  I^LETE,  it  is  usually  the  case 
that  the  value  of  IHOLD  contains  the  address  of  the  a^^j 
where  the  data  must  be  stored;  a similar  safety  check  is 
performed.  If  the  safety  check  fails,  a standard  row  search 
is  coi^pleted.  After  a thorough  search,  the  subroutine  de- 
faults into  the  '*fill-in'‘  mode. 

Por  fill-in,  the  program  oust  find  the  coordinate 
before  which  the  data  must  be  entered;  then,  all  of  tne  sub- 
sequent data  are  shuffled  down  one  location.  The  new  data 
value  is  inserted  into  the  empty  space  in  both  the  A and  JA 
array  tables.  The  variables  which  are  used  in  the  flow 
chairt  (Fig.  B-5)  are  the  same  standard  set  plus  the  fol- 
lowing s 

IBbMP  - The  coordinate  at  which  the 
fill-in  will  be  placed. 

IFIUu  - A counter  to  track  the  number 
of  fill-in  values  which  occur 
in  the  Forward  Gaussian 
process. 

The  starting  address  vector,  lA,  is  finally  Incremented  by 
one  for  each  row  after  the  fill-in  row. 

The  counter  IFILL  can  be  used  to  check  the  relative 
performance  of  the  varioue  pivoting  strategies;  the  counter 
is  no  longer  needed  for  the  final  version  of  the  program. 


ssagai 

The  subroutine  GSSGEN  accomplishes  the  forward  G^.ussian 
Elimination  step^  This  subroutine  is  really  an  illustrative 
example  in  that  the  flow  chart  (Figs.  B-6  and  B-7)  indicates 
pivoting  based  on  some  arbitrary  strategy.  The  following 
FORTRAN  variables  aire  defined t 

ELIM(I)  An  element  of  an  array  ELIM 

which  is  declaired  as  a LOGICAL 
variable  name.  If  ELIM(I)  is 
•TRUE. , then  row  I has  teen 
used  as  a pivot  row  and  should 
not  be  used  for  substitution. 

If  BLIK(I)  is  .FALSE.,  then 
row  I is  a candidate  for  a 
pivot  or  a substitution. 

lELRCM  - TThe  row  number  of  the  current 
pivot  row. 

JBL  - The  column  number  of  the 
current  pivot  column. 

IPIV  - An  array  which  orders  the  pivot 
rows  for  the  back-solver. 

JPIV  - An  array  which  orders  the  pivot 
columns  for  the  back-solver. 

JSTAT(I)  - The  number  of  non- zeros  in  the 

I-th  column.  This  status  vector 
is  used  by  Minimtun  Row/Minimiun 
Column  pivot  strategy,  for 
exanqple. 

If  the  diagonail  pivot  strategy  were  used,  there  would 
be  no  need  for  JSTAT  or  ELIM)  in  this  case,  IBLROW  and  JEL 
would  always  equal  the  value  K.  But  in  more  complicated 
strategies,  th^sse  variables  are  necessities. 

IPIV  and  JPIV  are  used  so  that  no  row  or  column  ex- 
changes are'  necessary  in  a pivoting  strategy.  Their 
utility  is  described  further  in  the  description  of  GAUS^X. 


•4 


GSSGEN 

1 

J 

Cl«&r  ELIM 
flags  to  .P. 
Constxxict  col- 
unn  status 
(JSTAT)  if 
required 


“DO"  K«1 
through  N 


Select  \ 
Pivot  coor-  ' 
dinates  from 
strategy  / 


Rows  lELROW 
Coll  JEL 


Fetch 


"Pivor** 


[fetch] 


Divide  row 
azxl  bi  by  f 
PIVOT  / 


[rowdiv] 


Delete 

Pivot 

Element 


BLIM(IELRQW)«.T. 

IPIV(K)»IELROW 

JPIVU)«JEL 


IR0WSar#row8  found 

JOOL(l)arow  number 
of  such  a 
row 


/ Find  rows  \ 
^whexe  ELIM(I)\ 
a.F.  and  have, 

\coluoin  J a / 
JEL  / 


[delete] 


If  column  counter, 
JSTAT,  is  used, 
reduce  each  column 
J of  IELROW  by  1. 


Pig*  B-6.  Searcli  Portion  of 


GAUSSBX 


The  back  solution  of  the  Gaussi2ui  Elimination  is 
done  by  the  GAUSSBX  subroutine.  The  operation  corresponds 
to  Eq  (7),  In  the  forward  solution,  the  ordering  of  rows 
and  columns  is  stored  in  the  arrays  IPIV  and  JPIV.  The  flow 
chart  for  GAUSSBX  (Fig.  B-8)  shows  how  the  proper  values 
are  computed.  In  the  example  of  Fig.  B~l,  if  the  second  row, 
third  column  were  the  last  pivot  coordinates,  then  IPIV(N) 
and  JPIV(N)  would  be  ”2"  and  ”3"  respectively.  The  first  Xj^ 
to  be  solved  would  therefore  be  x^*  Had  the  rows  and  columns 
been  interchanged,  X3  would  have  appeared  in  the  last  element 
in  the  jc  vector.  Thus,  the  us 2 of  row  and  column  pivot  ar- 
rays saves  much  extra  programming  and  execution  time  re- 
quired by  interchanges  in  A and  x. 

Subroutine  Summary 

The  calling  sequence  for  a driver  program  for  these 
subroutines  would  be  as  follows* 

PACKl  - To  store  the  compressed  data. 

GSSGEN  - For  the  forward  solution. 

GAUSSBX  - For  the  back  solution. 

There  are  other  subroutines  used  in  the  MFP  program;  but 
their  structure  is  very  simple,  and  reference  to  the 
listing  would  be  sufficient  for  further  study. 
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APPENDIX  C 


Core  Storage  for  Gaussian 
Sparse  Solvers 


^>pendix  C 


Table  C-I 

Core  Storage  for  Gaussian  Sparse  Solvers 


Program  Subroutine  Length  (Octal)  Data  Length  (Octal) 


Y3MP  - 

SORDBR 

411 

NSRORD 

151 

SSFAC 

351 

NSNFAC 

321 

NSBSLV 

141 

ZEROSYM 

317 

Total t 

2134 

20207 

SIMULT  - 

SIMULT 

432 

PIVSEL 

100 

Total t 

532 

17662 

MFP  - 

FETCH 

31 

ROWDIV 

22 

DELETE 

57 

STORE 

134 

GAUSSBX 

56 

#1 

GAUSSF 

> Diagonal 

413 

Totals 

757 

16666 

#2 

GAUSSFP 

- Partial 

417 

Totals 

763 

16666 

#3 

CAUSSFF 

- Full 

570 

Totals 

1034 

16666 

#4 

GAVSSm 

- Min  Rom 

771 

Nin  Col 

Total 

1335 

16666 

#5 

GAUSSML 

- ?Un  Row 

601 

Max  ala 

Totals 

1145 

16666 

S-toxags  for  imy  arbitrary  lOO-by-lOO  matrix,  3JI  cparsity. 
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APPENDIX  D 


Standard  Test  Matrices 


y^>pendix  E 


Intermediate  Test  Matrices 


This  appendix  contains  the  descriptions  of  each  of  the 
next  eight  test  matrices  and  tables  of  performance  for  five 
sparse  solvers. 


Descriptions  of  Intermedxate  Test  Matrices 

Test  Matrices  five  through  twelve  were  all  of  size 
n ■ 100,  with  non-zeros  at  3 to  5%  sparsity.  All  values 
aj^j  and  the  elements  of  b for  each  matrix  were  randomly  gen- 
erated by  a standard  function  in  the  CDC  6600  computer.  In 
some  cases,  the  coordinates  of  the  a*  • values  were  generated 


randomly;  thus,  not  only 
( random  structures. 

Test  Matrices  5,6,7 

Test  Matrix  8 

Test  Matrix  9 

Test  Matrices  10,11 

Test  Matrix  12 

O 


were  random  values  tested,  but  also 


- Three  different  tridiagonal 
asatrices . 

- An  arbitrary  tridiagonal 
matrix  with  an  additional  2% 
non-zero  structure  arbitrarily 
placed  within  ^5  diagonals  of 
the  main  diagonal. 

- Siadlar  to  Test  Matrix  8 ex- 
cept the  2%  extra  non-zeros 
are  contained  within  ^10 
diagonals. 

- Similar  to  Test  Matrix  8 ex- 
cept that  the  2%  extra  non- 
zeros  are  contained  within 
^15  diagonals. 

- Similar  to  Test  Matiix  8 ex- 
cept that  the  2%  extra  non- 
zeros are  arbitrarily  assigned 
throughout  the  entire  matrix 
structure. 
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The  measured  criteria  contained  in  Tables  E-1  through 
E>V  are  the  order  of  magnitude  of  the  average  scalar  rt;.id> 
ual  error  (Eq  [29] )f  the  execution  time,  the  number  of  times 
a fill-in  value  was  generated,  and  the  number  of  elements 
which  the  Gaussian  algorithm  deleted^  In  the  case  of  Gauss- 
ian Elimination  algorithms,  the  normalized  pivot  elements 
were  also  deleted  because  they  were  understood  to  be  exactly 
1.0  in  value. 


Table  E-I 

Intermediate  Tests  with  SIMULT 


Test 

Matrix 

Error 

Fill-ins 

Deletions 

Time 

(Seconds) 

# 5 

10*2 

97 

295 

0^26 

# 6 

w 

1 

o 

97 

295 

0.27 

# 7 

10*^ 

97 

295 

0.30 

# 8 

00 

1 

o 

336 

734 

0.33 

# 9 

1 

o 

987 

1385 

0.66 

#10 

OO 

1 

o 

H 

871 

1269 

0.64 

#11 

10-10 

1247 

1645 

0.89 

#12 

10-11 

2168 

2566 

1.70 

ZIBST  at  10~^0 

• 
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Tabl^  E-H 

Xnt.eno«diate  Tests  witn  MPP2  (Partial  Pivoting) 


Test 

Matrix 

Error 

Fill-ins 

Deletions 

Time 

( Seconds ) 

# 5 

10*^^ 

59 

258 

0.36 

# 6 

61 

260 

0.39 

# 7 

10-^3 

58 

257 

0.36 

# e 

10“^^ 

291 

512 

1.02 

# 9 

10--i2 

699 

721 

1.82 

#10 

10“^3 

930 

874 

2.19 

#11 

10-13 

1044 

946 

2.46 

#12 

10-13 

4038 

2405 

21.11 

Intermediate 

Table  E-III 
Tests  with  MFP3 

(Pull  Pivoting) 

Test 

Matrix 

Error 

Fill-ins 

Deletions 

Time 

( Seconds ) 

# 5 

10-1 

232 

305 

0.73 

# 6 

10*1 

233 

305 

0.76 

# 7 

10^1 

218 

302 

0.74 

# & 

10^ 

1149 

827 

3.40 

# 9 

loO 

2002 

1267 

7.77 

#10 

10° 

2153 

1105 

7.79 

#11 

10° 

2129 

1409 

9.17 

#13 

10*1 

3815 

2177 

26.82 
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Table  E-IV 

Intermediate  Tests  with  MPP4  (Min  Row/Min  Col) 


Test 

Matrix 

Error 

Pill-ins 

Deletions 

Time 

( Seconds ) 

# 5 

10-1^ 

0 

199 

0.37 

# 6 

e 

1 

o 

0 

199 

0,37 

# 7 

10“^^ 

0 

199 

0.43 

# a 

1 

o 

207 

464 

1.00 

# 9 

loO 

792 

936 

2.38 

#10 

951 

958 

3.06 

#11 

failed 

- coi^xited 

an  infinite 

operand 

#12 

10^^ 

2117 

1750 

9.55 

Table  B-V 


Intermediate  Rests 

with  MPP5 

(Min  Row/Maoc  Element) 

Test 

Matrix 

Error 

Pill-ins 

Deletions 

Time 

(Seconds) 

# 5 

10”^^ 

59 

258 

0.44 

# 6 

e 

1 

o 

61 

260 

0.45 

# 7 

10”^^ 

56 

257 

0.45 

# 8 

lO"^ 

249 

504 

1.03 

# 9 

loO 

629 

771 

1.90 

#10 

10^^ 

933 

1051 

3.79 

#11 

icP 

717 

779 

2.10 

#12 

10° 

1713 

1453 

6.35 
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APPENDIX  G 


Gaiussiain  Elimination  Program  Listing 


..is;'  > 


Appendix  G 


Gaussian  Eli^nitir  tion  Program  Listing 

This  Appendix  contains  the  listing  of  the  Gaussian 
Elimination  Sparse  Solver  called  "SMART."  This  particular 
program  is  not  the  production  model;  the  reader  will  note 
that  the  sparse  matrix  is  read  in  from  a permanent  disk  file 
called  "TAPE2."  The  only  8ignific2ait  difference  between 
this  listing  and  the  listing  of  the  production  model  is  that 
all  rcfarerces  to  logical  unit  2 have  been  replaced  by  list> 
directed  "READ‘«’  commrnds  from  data  cards. 
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